We say that a $$k$$-mer is shared by two genomes if either the $$k$$-mer or its reverse complement appears in each genome. In the figure below are four pairs of 3-mers that are shared by AAACTCATC and TTTCAAATC.

shared k-mers
The shared 3-mers of AAACTCATC and TTTCAAATC. The second pair are shown in blue because they are reverse complementary. The four pairs of shared 3-mers can be represented by the ordered pairs (0, 4), (0, 0), (4, 2) and (6, 6).

A shared $$k$$-mer can be represented by an ordered pair $$(x, y)$$, where $$x$$ is the starting position of the $$k$$-mer in the first genome and $$y$$ is the starting position of the $$k$$-mer in the second genome. For the genomes AAACTCATC and TTTCAAATC, these shared 3-mers are $$(0, 4)$$, $$(0, 0)$$, $$(4, 2)$$ and $$(6, 6)$$.

Assignment

Write a function shared_kmers that takes two arguments: i) and integer $$k \in \mathbb{N}_0$$ and ii) the location of a FASTA file containing two DNA sequences $$v$$ and $$w$$. The function must return a set containing all $$k$$-mers shared by $$v$$ and $$w$$. Shared $$k$$-mers are represented as a tuple $$(x, y)$$ corresponding to the starting positions of these $$k$$-mers in $$v$$ and $$w$$.

Example

In the following interactive session, we assume the FASTA file data01.fna1 to be located in the current directory.

>>> shared_kmers(3, 'data01.fna')
{(0, 0), (0, 4), (4, 2), (6, 6)}