There are three different ways to divide a DNA string into codons for translation, one starting at each of the first three starting positions of the string. These different ways of dividing a DNA string into codons are called reading frames. Since DNA is double-stranded, a genome has six reading frames (three on each strand), as shown in the figure below.
We say that a DNA string $$s$$ encodes an amino acid string $$p$$ if the RNA string transcribed from either $$s$$ or its reverse complement $$\bar{s}$$ translates into $$p$$.
Write a function peptide_matches that takes an amino acid string $$p$$ and the location of a FASTA file containing a DNA string $$g$$. The function must return a set containing all substrings of $$g$$ encoding amino acid string $$p$$. Each of these substrings is represented as a tuple ($$x$$, $$y$$, $$s$$) containing the following elements:
the position $$x \in \mathbb{N}$$ of the first base of $$s$$ in $$g$$
the position $$y \in \mathbb{N}$$ following the last base of $$s$$ in $$g$$
the DNA string $$s$$ that translates into $$p$$
All positions in the genome $$g$$ are zero-indexed. Substrings that encode $$p$$ on the forward strand have $$x < y$$. Substrings that encode $$p$$ on the backward strand (the reverse complement of $$g$$) have $$x > y$$.
In the following interactive session, we assume the FASTA file data01.fna1 to be located in the current directory.
>>> peptide_matches('MA', 'data01.fna') {(6, 12, 'ATGGCC'), (0, 6, 'ATGGCC'), (7, 1, 'ATGGCC')}
The stop codon should not be translated, as shown in the sample dataset.