Given a collection of DNA strings $$\mathcal{C}_{\text{DNA}}$$ and an integer $$d$$, a $$k$$-mer is a ($$k$$, $$d$$)-motif if it appears in every string from $$\mathcal{C}_{\text{DNA}}$$ with at most $$d$$ mismatches.

Assignment

Write a function motif_enumeration that takes three arguments: i) the location of a FASTA file containing a collection of DNA strings, ii) an integer $$k$$ and iii) an integer $$d$$. The function must return a set containing all ($$k$$, $$d$$)-motifs in $$\mathcal{C}_{\text{DNA}}$$.

Example

In the following interactive session, we assume the FASTA files data01.fna1 and data02.fna2 to be located in the current directory.

>>> motif_enumeration('data01.fna', 3, 1)
{'TTT', 'GTT', 'ATA', 'ATT'}
>>> motif_enumeration('data02.fna', 5, 1)
{'AAGCA', 'TGCAT', 'ACGCA', 'CGGTA', 'AGCAT', 'GCATA', 'CATGC', 'CAGGA', 'ATGCA', 'AGGCA'}