Given a $$k$$-mer $$p$$ and a longer string $$s$$, we use $$d(p, s)$$ to denote the minimum Hamming distance between $$p$$ and any $$k$$-mer in $$s$$. \[ d(p, s) = \min\limits_{\text{all }k\text{-mers p' in s}}{\text{hammingDistance}(p, p')} \] Given a $$k$$-mer $$p$$ and a collection of DNA strings $$\mathcal{C}_\text{DNA} = \left\{s_1,\ldots s_n\right\}$$, we define $$d(p, \mathcal{C}_\text{DNA})$$ as the sum of the distances between $$p$$ and all strings in $$\mathcal{C}_\text{DNA}$$. \[ d(p, \mathcal{C}_\text{DNA}) = \sum_{i=1}^{n}d(p, s_i) \]
Write a function distance_to_string that takes a DNA string $$p$$ and a longer DNA string $$s$$. The function must return $$d(p, s)$$.
Write a function distance_to_strings that takes a DNA string $$p$$ and the location of a FASTA file containing a collection of DNA strings $$\mathcal{C}_\text{DNA}$$. The function must return $$d(p, \mathcal{C}_\text{DNA})$$.
In the following interactive session, we assume the FASTA files data01.fna1 and data02.fna2 to be located in the current directory.
>>> distance_to_string('AAA', 'TTACCTTAAC') 1 >>> distance_to_string('AAA', 'GATATCTGTC') 1 >>> distance_to_string('AAA', 'ACGGCGTTCG') 2 >>> distance_to_string('AAA', 'CCCTAAAGAG') 0 >>> distance_to_string('AAA', 'CGTCAGAGGT') 1 >>> distance_to_strings('AAA', 'data01.fna') 5 >>> distance_to_strings('TAA', 'data02.fna') 3