Say that we wish to compare the approximately 20,000 amino acid-long NRP synthetase from Bacillus brevis with the approximately 600 amino acid-long A-domain from Streptomyces roseosporus, the bacterium that produces the powerful antibiotic Daptomycin. We hope to find a region within the longer protein sequence $$v$$ that has high similarity with all of the shorter sequence $$w$$. Global alignment will not work because it tries to align all of $$v$$ to all of $$w$$. Local alignment will not work because it tries to align substrings of both $$v$$ and $$w$$. Thus, we have a distinct alignment application called the fitting alignment.

"Fitting" $$w$$ to $$v$$ requires finding a substring $$v'$$ of $$v$$ that maximizes the global alignment score between $$v'$$ and $$w$$ among all substrings of $$v$$.

Assignment

In this assignment we will construct a highest-scoring fitting alignment between two strings. To score alignments, we use the simple scoring method in which matches count +1 and both the mismatch and indel penalties are equal to 1. Your task:

Example

In the following interactive session, we assume the FASTA file data01.fna1 to be located in the current directory.

>>> fitting_alignment_score('data01.fna')
2

>>> fitting_alignment('data01.fna')
('TAGGCTTA', 'TAGA-T-A')