Say that we wish to compare the approximately 20,000 amino acid-long NRP synthetase from Bacillus brevis with the approximately 600 amino acid-long A-domain from Streptomyces roseosporus, the bacterium that produces the powerful antibiotic Daptomycin. We hope to find a region within the longer protein sequence $$v$$ that has high similarity with all of the shorter sequence $$w$$. Global alignment will not work because it tries to align all of $$v$$ to all of $$w$$. Local alignment will not work because it tries to align substrings of both $$v$$ and $$w$$. Thus, we have a distinct alignment application called the fitting alignment.
"Fitting" $$w$$ to $$v$$ requires finding a substring $$v'$$ of $$v$$ that maximizes the global alignment score between $$v'$$ and $$w$$ among all substrings of $$v$$.
In this assignment we will construct a highest-scoring fitting alignment between two strings. To score alignments, we use the simple scoring method in which matches count +1 and both the mismatch and indel penalties are equal to 1. Your task:
Write a function fitting_alignment_score that takes the location of a FASTA file containing two amino acid sequences $$v$$ and $$w$$. The function must return the maximum score of a fitting alignment of $$v$$ and $$w$$.
Write a function fitting_alignment that takes the location of a FASTA file containing two amino acid sequences $$v$$ and $$w$$. The function must return a fitting alignment of $$v$$ and $$w$$, represented as a tuple of two strings with indels represented by hyphens (-). If multiple fitting alignments achieving the maximum score exist, the function may return any one.
In the following interactive session, we assume the FASTA file data01.fna1 to be located in the current directory.
>>> fitting_alignment_score('data01.fna') 2 >>> fitting_alignment('data01.fna') ('TAGGCTTA', 'TAGA-T-A')