Consider this greedy approach for finding $$k$$-mer motifs in a collection of DNA strings $$\mathcal{C}_\text{DNA}$$.

        GreedyMotifSearch(k, CDNA)
    BestMotifs ← motif matrix formed by first k-mers in each string from CDNA
    for each k-mer Motif in the first string from CDNA
        Motif1 ← Motif
        for i = 2 to t
            form Profile from motifs Motif1, …, Motifi - 1
            Motifi ← Profile-most probable k-mer in the i-th string in CDNA
        Motifs ← (Motif1, …, Motift)
        if Score(Motifs) < Score(BestMotifs)
            BestMotifs ← Motifs
    return BestMotifs
      

Assignment

Write a function greedy_motif_search that takes an integer $$k$$ and the location of a FASTA file containing a collection of DNA strings $$\mathcal{C}_\text{DNA}$$. The function must return a tuple containing the $$k$$-mers resulting from a greedy motif search in $$\mathcal{C}_\text{DNA}$$. If at any step the function finds more than one $$\mathcal{P}$$-most probable $$k$$-mers in a given DNA string, it must use the one occurring first (the leftmost one).

Example

In the following interactive session, we assume the FASTA files data01.fna¹, data02.fna², data03.fna³, data04.fna⁴ and data05.fna⁵ to be located in the current directory.

        >>> greedy_motif_search(3, 'data01.fna')
('CAG', 'CAG', 'CAA', 'CAA', 'CAA')
>>> greedy_motif_search(3, 'data02.fna')
('GCC', 'GCC', 'AAC', 'TTC')
>>> greedy_motif_search(5, 'data03.fna')
('GAGGC', 'TCATC', 'TCGGC', 'GAGTC', 'GCAGC', 'GCGGC', 'GCGGC', 'GCATC')
>>> greedy_motif_search(6, 'data04.fna')
('GTGCGT', 'GTGCGT', 'GCGCCA', 'GTGCCA', 'GCGCCA')
>>> greedy_motif_search(5, 'data05.fna')
('GCAGC', 'TCATT', 'GGAGT', 'TCATC', 'GCATC', 'GCATC', 'GGTAT', 'GCAAC')