A gap is a contiguous sequence of spaces in a row of an alignment. One way to score gaps more appropriately is to define an affine penalty for a gap of length $$k$$ as $$\sigma + \varepsilon\cdot (k − 1)$$, where $$\sigma$$ is the gap opening penalty, assessed to the first symbol in the gap, and $$\varepsilon$$ is the gap extension penalty, assessed to each additional symbol in the gap. We typically select $$\varepsilon$$ to be smaller than $$\sigma$$ so that the affine penalty for a gap of length $$k$$ is smaller than the penalty for $$k$$ independent single-nucleotide indels ($$\sigma\cdot k$$).
In this assignment we will construct a highest-scoring global alignment (with affine gap penalties) between two strings. To score alignments, we use the BLOSUM62 scoring matrix, a gap opening penalty of 11 and a gap extension penalty of 1. Your task:
Write a function global_alignment_score that takes the location of a FASTA file containing two amino acid sequences $$v$$ and $$w$$. The function must return the global alignment score of $$v$$ and $$w$$.
Write a function global_alignment that takes the location of a FASTA file containing two amino acid sequences $$v$$ and $$w$$. The function must return a global alignment of $$v$$ and $$w$$, represented as a tuple of two strings with indels represented by hyphens (-). If multiple global alignments achieving the maximum score exist, the function may return any one.
In the following interactive session, we assume the FASTA file data01.faa1 to be located in the current directory.
>>> global_alignment_score('data01.faa') 8 >>> global_alignment('data01.faa') ('PRT---EINS', 'PRTWPSEIN-')