In "Distance between pattern and strings1" we computed the distance $$d(p, \mathcal{C}_{\text{DNA}})$$ between a $$k$$-mer $$p$$ and a collection of DNA strings $$\mathcal{C}_{\text{DNA}}$$. We will now try to find a $$k$$-mer $$p$$ that minimizes $$d(p, \mathcal{C}_{\text{DNA}})$$ over all $$k$$-mers $$p$$, the same task that the Equivalent Motif Finding problem is trying to achieve. We call such a $$k$$-mer a median string for $$\mathcal{C}_{\text{DNA}}$$.
In the following interactive session, we assume the FASTA files data01.fna2, data02.fna3, data03.fna4 and data04.fna5 to be located in the current directory.
>>> median_string(3, 'data01.fna') {'GAC', 'ACG'} >>> median_string(3, 'data02.fna') {'CGT', 'ACG'} >>> median_string(3, 'data03.fna') {'AAA'} >>> median_string(3, 'data04.fna') {'AAG', 'AAT'}