As we saw in "Counting optimal alignments1" there will usually be a huge number of different optimal alignments of two given strings. In this problem, which represents a first attempt to understand how much optimal alignments can differ. We will select two symbols at a time from the two strings and ask how much the maximum alignment score can differ from the optimal score if we demand that these two symbols must be aligned (i.e., implying that one symbol must be substituted for the other).
Say that we have two strings $$s$$ and $$t$$ of respective lengths $$m$$ and $$n$$ and an alignment score. Let's define a matrix $$\textrm{M}$$ corresponding to $$s$$ and $$t$$ by setting $$\textrm{M}_{j, k}$$ equal to the maximum score of any alignment that aligns $$s[j]$$ with $$t[k]$$. So each entry in $$\textrm{M}$$ can be equal to at most the maximum score of any alignment of $$s$$ and $$t$$. Your task:
Write a function global_alignment_score that takes two DNA strings $$s$$ and $$t$$. The function must return the score of the optimal global alignment between $$s$$ and $$t$$. To align the two given DNA strings, the function must use an alignment score in which matching symbols count $$+1$$, substitutions count $$-1$$, and there is a linear gap penalty of $$1$$.
Write a function fixed_symbol_alignment_scores that takes two DNA strings $$s$$ and $$t$$. The function must return the matrix $$M$$ containing the scores of the fixed symbol alignments of $$s$$ and $$t$$, as defined above. In doing so, the function must use an alignment score in which matching symbols count $$+1$$, substitutions count $$-1$$, and there is a linear gap penalty of $$1$$. The matrix $$M$$ must be returned as a list of columns, in which each column is itself represented as a list of integers.
In the following interactive session, we assume the FASTA file data.fna2 to be located in the current directory.
>>> from Bio import SeqIO >>> global_alignment_score('ATAGATA', 'ACAGGTA') 3 >>> fixed_symbol_alignment_scores('ATAGATA', 'ACAGGTA') [[3, 0, -1, -4, -7, -10, -11], [0, 3, 0, -1, -4, -5, -10], [-1, 0, 3, 0, -3, -6, -5], [-4, -1, -2, 3, 2, -3, -6], [-5, -4, -1, 0, 3, 0, -1], [-10, -7, -6, -3, 0, 3, -2], [-11, -10, -7, -6, -3, -2, 3]] >>> global_alignment_score(*SeqIO.parse('data.fna3', 'fasta')) 1 >>> fixed_symbol_alignment_scores(*SeqIO.parse('data.fna4', 'fasta')) [[1, -1, -4, -7], [-1, 0, -1, -4], [-2, 1, 0, -3], [-7, -4, -1, 1]]