We have covered both global and local alignments. However, sometimes we need a hybrid approach that avoids the weaknesses of these two methods. One such alternate approach is that of fitting alignments outlined in "Finding a motif with modifications1".
Another tactic is to allow ourselves to trim off gaps appearing on the ends of a global alignment for free. This is relevant if one of our strings to be aligned happens to contain additional symbols on the ends that are not relevant for the particular alignment at hand.
A semiglobal alignment of strings $$s$$ and $$t$$ is an alignment in which any gaps appearing as prefixes or suffixes of $$s$$ and $$t$$ do not contribute to the alignment score. Your task:
Write a function semiglobalAlignmentScore that takes two DNA strings $$s$$ and $$t$$. The function must return the maximum semiglobal alignment score of $$s$$ and $$t$$. To align the two given DNA strings, the function must use an alignment score in which matching symbols count +1, substitutions count -1, and there is a linear gap penalty of 1.
Write a function semiglobalAlignment that takes two DNA strings $$s$$ and $$t$$. The function must return a tuple containing an alignment of $$s$$ and $$t$$ achieving the maximum semiglobal alignment score of $$s$$ and $$t$$. To align the two given DNA strings, the function must use an alignment score in which matching symbols count +1, substitutions count -1, and there is a linear gap penalty of 1. If multiple optimal alignments exist, the function may return any one.
In the following interactive session, we assume the FASTA file data.fna2 to be located in the current directory.
>>> from Bio import SeqIO >>> semiGlobalAlignmentScore('CAGCACTTGGATTCTCGG', 'CAGCGTGG') 4 >>> semiGlobalAlignmentScore(*SeqIO.parse('data.fna', 'fasta')) 163 >>> semiGlobalAlignment('CAGCACTTGGATTCTCGG', 'CAGCGTGG') ('CAGCA-CTTGGATTCTCGG', '---CAGCGTGG--------') >>> semiGlobalAlignment(*SeqIO.parse('data.fna', 'fasta')) ('-------------------GGGGGCGGCCCCGGACGGCCGTTAGCGCACACCCCTTTGAAGTACTCGCGAGTGCCGAAGGTTACTCAAGGCAACCCAGGGGGTGCTAAGTGGCTGAACCGTCATGCAATAGGTAGCTACTCCTCGGCAGAGTCCCTCGAGGGCTCAAGCTCGCTAATGCCGAAGCTTCCGTCCAAACCCTAACTGTCACTGACCACTACTGAAACGCGCCTGACTGGGCCTATACCCGTAGTATTATACGCCGCAATTAACCTCCGCTCCGGTGGC--------', 'GGCTATGGAGGTGGACAATGGGG-CGGCCC-GGACGGCCGT-AG-GC-CACCCCTTTGAAGTA-TCGCGAGTGC-GA---TTACTCA-GGCA-CCCAGGGG---CTAAGTGGCTGAACCGT-AT-CAA-AGGTAGCTAC-C-TCG-CAGAGTCCCTCG-GGGCTCA-GCTCGCTACTGCCGA-GCT-CC-TC-AA-CCCT--CTGTCAC-G-CC-CTACT-AA-C-CGC-TGA-TGGGCCTA-A----T-GT-TTATACGCCGCA-T-AACCT--GC-CCGGTGG-TGACTAGC')