A gap is a contiguous sequence of spaces in a row of an alignment. One way to score gaps more appropriately is to define an affine penalty for a gap of length $$k$$ as $$\sigma + \varepsilon\cdot (k − 1)$$, where $$\sigma$$ is the gap opening penalty, assessed to the first symbol in the gap, and $$\varepsilon$$ is the gap extension penalty, assessed to each additional symbol in the gap. We typically select $$\varepsilon$$ to be smaller than $$\sigma$$ so that the affine penalty for a gap of length $$k$$ is smaller than the penalty for $$k$$ independent single-nucleotide indels ($$\sigma\cdot k$$).

Assignment

In this assignment we will construct a highest-scoring global alignment (with affine gap penalties) between two strings. To score alignments, we use the BLOSUM62 scoring matrix, a gap opening penalty of 11 and a gap extension penalty of 1. Your task:

Example

In the following interactive session, we assume the FASTA file data01.faa1 to be located in the current directory.

>>> global_alignment_score('data01.faa')
8

>>> global_alignment('data01.faa')
('PRT---EINS', 'PRTWPSEIN-')