Say that we wish to compare the approximately 20,000 amino acid-long NRP synthetase from Bacillus brevis with the approximately 600 amino acid-long A-domain from Streptomyces roseosporus, the bacterium that produces the powerful antibiotic Daptomycin. We hope to find a region within the longer protein sequence $$v$$ that has high similarity with all of the shorter sequence $$w$$. Global alignment will not work because it tries to align all of $$v$$ to all of $$w$$. Local alignment will not work because it tries to align substrings of both $$v$$ and $$w$$. Thus, we have a distinct alignment application called the fitting alignment.

"Fitting" $$w$$ to $$v$$ requires finding a substring $$v'$$ of $$v$$ that maximizes the global alignment score between $$v'$$ and $$w$$ among all substrings of $$v$$.

Assignment

In this assignment we will construct a highest-scoring fitting alignment between two strings. To score alignments, we use the simple scoring method in which matches count +1 and both the mismatch and indel penalties are equal to 1. Your task:

Write a function fitting_alignment_score that takes the location of a FASTA file containing two amino acid sequences $$v$$ and $$w$$. The function must return the maximum score of a fitting alignment of $$v$$ and $$w$$.
Write a function fitting_alignment that takes the location of a FASTA file containing two amino acid sequences $$v$$ and $$w$$. The function must return a fitting alignment of $$v$$ and $$w$$, represented as a tuple of two strings with indels represented by hyphens (-). If multiple fitting alignments achieving the maximum score exist, the function may return any one.

Example

In the following interactive session, we assume the FASTA file data01.fna¹ to be located in the current directory.

        >>> fitting_alignment_score('data01.fna')
2

>>> fitting_alignment('data01.fna')
('TAGGCTTA', 'TAGA-T-A')