The pseudocode below for LinearSpaceAlignment describes how to recursively find a longest path in the alignment graph constructed for a substring $$v_{\text{top}+1}\ldots v_\text{bottom}$$ of $$v$$ and a substring $$w_{\text{left}+1}\ldots w_\text{right}$$ of $$w$$. LinearSpaceAlignment calls the function MiddleNode(top, bottom, left, right), which returns the coordinate $$i$$ of the middle node $$(i, j)$$ defined by the sequences $$v_{\text{top}+1}\ldots v_\text{bottom}$$ and $$w_{\text{left}+1}\ldots w_\text{right}$$. LinearSpaceAlignment also calls MiddleEdge(top, bottom, left, right), which returns → , ↓, or ↘ depending on whether the middle edge is horizontal, vertical, or diagonal. The linear-space alignment of strings $$v$$ and $$w$$ is constructed by calling LinearSpaceAlignment(0, n, 0, m). The case left = right describes the alignment of an empty string against the string $$v_{\text{top}+1}\ldots v_\text{bottom}$$, which is trivially computed as the score of a gap formed by bottom - top vertical edges.
LinearSpaceAlignment(top, bottom, left, right) if left = right return alignment formed by bottom - top vertical edges if top = bottom return alignment formed by right − left horizontal edges middle ← ⌊(left + right)/2⌋ midNode ← MiddleNode(top, bottom, left, right) midEdge ← MiddleEdge(top, bottom, left, right) LinearSpaceAlignment(top, midNode, left, middle) output midEdge if midEdge = "→" or midEdge = "↘" middle ← middle + 1 if midEdge = "↓" or midEdge ="↘" midNode ← midNode + 1 LinearSpaceAlignment(midNode, bottom, middle, right)
In this assignment we will construct a highest-scoring global alignment (with linear gap penalties) between two strings using linear space. To score alignments, we use the BLOSUM62 scoring matrix and an indel penalty $$\sigma = 5$$. Your task:
Write a function global_alignment_score that takes the location of a FASTA file containing two amino acid sequences $$v$$ and $$w$$. The function must return the global alignment score of $$v$$ and $$w$$.
Write a function global_alignment that takes the location of a FASTA file containing two amino acid sequences $$v$$ and $$w$$. The function must return a global alignment of $$v$$ and $$w$$, represented as a tuple of two strings with indels represented by hyphens (-). If multiple global alignments achieving the maximum score exist, the function may return any one.
In the following interactive session, we assume the FASTA file data01.faa1 to be located in the current directory.
>>> global_alignment_score('data01.faa') 8 >>> global_alignment('data01.faa') ('PLEASANTLY', '-MEA--N-LY')