Consider this greedy approach for finding $$k$$-mer motifs in a collection of DNA strings $$\mathcal{C}_\text{DNA}$$.

GreedyMotifSearch(k, CDNA)
    BestMotifs ← motif matrix formed by first k-mers in each string from CDNA
    for each k-mer Motif in the first string from CDNA
        Motif1Motif
        for i = 2 to t
            form Profile from motifs Motif1, …, Motifi - 1
            MotifiProfile-most probable k-mer in the i-th string in CDNA
        Motifs ← (Motif1, …, Motift)
        if Score(Motifs) < Score(BestMotifs)
            BestMotifs Motifs
    return BestMotifs

Assignment

Write a function greedy_motif_search that takes an integer $$k$$ and the location of a FASTA file containing a collection of DNA strings $$\mathcal{C}_\text{DNA}$$. The function must return a tuple containing the $$k$$-mers resulting from a greedy motif search in $$\mathcal{C}_\text{DNA}$$. If at any step the function finds more than one $$\mathcal{P}$$-most probable $$k$$-mers in a given DNA string, it must use the one occurring first (the leftmost one).

Example

In the following interactive session, we assume the FASTA files data01.fna1, data02.fna2, data03.fna3, data04.fna4 and data05.fna5 to be located in the current directory.

>>> greedy_motif_search(3, 'data01.fna')
('CAG', 'CAG', 'CAA', 'CAA', 'CAA')
>>> greedy_motif_search(3, 'data02.fna')
('GCC', 'GCC', 'AAC', 'TTC')
>>> greedy_motif_search(5, 'data03.fna')
('GAGGC', 'TCATC', 'TCGGC', 'GAGTC', 'GCAGC', 'GCGGC', 'GCGGC', 'GCATC')
>>> greedy_motif_search(6, 'data04.fna')
('GTGCGT', 'GTGCGT', 'GCGCCA', 'GTGCCA', 'GCGCCA')
>>> greedy_motif_search(5, 'data05.fna')
('GCAGC', 'TCATT', 'GGAGT', 'TCATC', 'GCATC', 'GCATC', 'GGTAT', 'GCAAC')