Given a profile matrix $$\mathcal{P}$$, we can evaluate the probability of every $$k$$-mer in a string $$s$$ and find a $$\mathcal{P}$$-most probable $$k$$-mer in $$s$$, i.e., a $$k$$-mer that was most likely to have been generated by $$\mathcal{P}$$ among all $$k$$-mers in $$s$$. For example, if

$$\mathcal{P} = $$
A .2 .2 .0 .0 .0 .0 .9 .1 .1 .1 .3 .0
C .1 .6 .0 .0 .0 .0 .0 .4 .1 .2 .4 .6
G .0 .0 1. 1. .9 .9 .1 .0 .0 .0 .0 .0
T .7 .2 .0 .0 .1 .1 .0 .5 .8 .7 .3 .4

then ACGGGGATTACC is the $$\mathcal{P}$$-most probable 12-mer in GGTACGGGGATTACCT. Indeed, every other 12-mer in this string has probability 0.

Assignment

Write a function profilemost_probable_kmer that takes the location of two files: i) a FASTA file containing a DNA string $$s$$ and ii) a text file containing a profile matrix $$\mathcal{P}$$ with each row on a separate line and the values on each row separated by spaces. The function must return a $$\mathcal{P}$$-most probable $$k$$-mer in $$s$$. If there are multiple $$\mathcal{P}$$-most probable $$k$$-mers in $$s$$, the function may return any one.

Example

In the following interactive session, we assume the FASTA files data01.fna1, data02.fna2 and data03.fna3 and the text files data01.prof4, data02.prof5 and data03.prof6 to be located in the current directory.

>>> profilemost_probable_kmer('data01.fna', 'data01.prof')
'CCGAG'
>>> profilemost_probable_kmer('data02.fna', 'data02.prof')
'AGCAGCTT'
>>> profilemost_probable_kmer('data03.fna', 'data03.prof')
'AAGCAGAGTTTA'