Given a $$k$$-mer $$p$$ and a longer string $$s$$, we use $$d(p, s)$$ to denote the minimum Hamming distance between $$p$$ and any $$k$$-mer in $$s$$. \[ d(p, s) = \min\limits_{\text{all }k\text{-mers p' in s}}{\text{hammingDistance}(p, p')} \] Given a $$k$$-mer $$p$$ and a collection of DNA strings $$\mathcal{C}_\text{DNA} = \left\{s_1,\ldots s_n\right\}$$, we define $$d(p, \mathcal{C}_\text{DNA})$$ as the sum of the distances between $$p$$ and all strings in $$\mathcal{C}_\text{DNA}$$. \[ d(p, \mathcal{C}_\text{DNA}) = \sum_{i=1}^{n}d(p, s_i) \]

Assignment

Write a function distance_to_string that takes a DNA string $$p$$ and a longer DNA string $$s$$. The function must return $$d(p, s)$$.
Write a function distance_to_strings that takes a DNA string $$p$$ and the location of a FASTA file containing a collection of DNA strings $$\mathcal{C}_\text{DNA}$$. The function must return $$d(p, \mathcal{C}_\text{DNA})$$.

Example

In the following interactive session, we assume the FASTA files data01.fna¹ and data02.fna² to be located in the current directory.

        >>> distance_to_string('AAA', 'TTACCTTAAC')
1
>>> distance_to_string('AAA', 'GATATCTGTC')
1
>>> distance_to_string('AAA', 'ACGGCGTTCG')
2
>>> distance_to_string('AAA', 'CCCTAAAGAG')
0
>>> distance_to_string('AAA', 'CGTCAGAGGT')
1

>>> distance_to_strings('AAA', 'data01.fna')
5
>>> distance_to_strings('TAA', 'data02.fna')
3