Given a profile matrix $$\mathcal{P}$$, we can evaluate the probability of every $$k$$-mer in a string $$s$$ and find a $$\mathcal{P}$$-most probable $$k$$-mer in $$s$$, i.e., a $$k$$-mer that was most likely to have been generated by $$\mathcal{P}$$ among all $$k$$-mers in $$s$$. For example, if

$$\mathcal{P} = $$

`A`:	.2	.2	.0	.0	.0	.0	.9	.1	.1	.1	.3	.0
`C`:	.1	.6	.0	.0	.0	.0	.0	.4	.1	.2	.4	.6
`G`:	.0	.0	1.	1.	.9	.9	.1	.0	.0	.0	.0	.0
`T`:	.7	.2	.0	.0	.1	.1	.0	.5	.8	.7	.3	.4

then ACGGGGATTACC is the $$\mathcal{P}$$-most probable 12-mer in GGTACGGGGATTACCT. Indeed, every other 12-mer in this string has probability 0.

Assignment

Write a function profilemost_probable_kmer that takes the location of two files: i) a FASTA file containing a DNA string $$s$$ and ii) a text file containing a profile matrix $$\mathcal{P}$$ with each row on a separate line and the values on each row separated by spaces. The function must return a $$\mathcal{P}$$-most probable $$k$$-mer in $$s$$. If there are multiple $$\mathcal{P}$$-most probable $$k$$-mers in $$s$$, the function may return any one.

Example

In the following interactive session, we assume the FASTA files data01.fna¹, data02.fna² and data03.fna³ and the text files data01.prof⁴, data02.prof⁵ and data03.prof⁶ to be located in the current directory.

        >>> profilemost_probable_kmer('data01.fna', 'data01.prof')
'CCGAG'
>>> profilemost_probable_kmer('data02.fna', 'data02.prof')
'AGCAGCTT'
>>> profilemost_probable_kmer('data03.fna', 'data03.prof')
'AAGCAGAGTTTA'