Penalizing large insertions and deletions

In dealing with global alignment in "Global alignment with scoring matrix1", we encountered a linear gap penalty, in which the insertion or deletion of a gap is penalized by some constant times the length of the gap. However, this model is not necessarily the most practical model, as one large rearrangement could have inserted or deleted a long gap in a single step to transform one genetic string into another.

Assignment

In a constant gap penalty, every gap receives some predetermined constant penalty, regardless of its length. Thus, the insertion or deletion of 1000 contiguous symbols is penalized equally to that of a single symbol. Your task:

Example

In the following interactive session, we assume the FASTA file data.faa2 to be located in the current directory.

>>> from Bio import SeqIO

>>> globalAlignmentScore('PLEASANTLY', 'MEANLY')
13
>>> globalAlignmentScore(*SeqIO.parse('data.faa', 'fasta'))
2793

>>> globalAlignment('PLEASANTLY', 'MEANLY')
('PLEASANTLY', '-MEA--N-LY')
>>> globalAlignment(*SeqIO.parse('data.faa', 'fasta'))    ('SFLEDSNLKIRPNRQSSWVGPTMCS--AIYWIQMTRSLKMLVPEHNAQQRDTEDVLESPDFKWWWCLIEGTVKHAMKKMETPP------------HAIFMSPWYCGLRVVRTNFL---MDFKDFMYESMLCEYPWCYTIDMVPNMVFRNQIWRHEMYPAK----WFFRGIVGRLEFFECRSIYNVVQW-G-WL-SRS-TLMMRVQVPETMGKPEMGATVYFIIHCEWVITAWLQYKEKQREDEACAHKCCRWERKQMCLNQELFFHPQTCLALAHDQWCVCWYGQVKRMVYTWVH--------CNVQWYWLSHRNGGPAMAQCEAQIHSVSMMHTRLCDYKRRSWFWQWVCACQVEYAACPEADSGFMQDSGWLDKTLSDKCV--DCMWYAMCVPCISRMHVCFP--AKLLQGHAQSYPYFNCWAQVMWWA-------KNMIQ-----FIQ-NWTCMMKASDHYLDAGKHRPQFAAVLN-QCPTVVVQ------QIVLSPARMDIFTWTELTQT-----LYCVTPQMGPWRPCLIGNCNVLVVYMECFFGPMLTMMFSEPPQLLGDEKRGGLRG------NHCYFMGAHPQADVQNLVMFEYPFLYYREQWTDLFGSYQPEDYNSMLYHRFMDDCIMYHYG----ILRNNCLKAG---ILNAFVKPRQNIRNNTREHCFALQAWATPEMDFERYWAGAENFQSGMIMQYWPHGYMDYQWQMHKAEYPNWIPWAQSWA--LQYLHPCWTFYDNVIFNIKTCQGVWKYYYTFEYLMAGIQQQGESSKMKSFYYKWTNLERMEIEQSMMCCCCGRNLRAGTVAEPMHNVADYKFTE--MHYGWYHTPDIGYHVRAFEKGTPVYRSPKATLHTE----TME-VNWDVCAWPW-YQW-------LTVDEQHV', 'SFLVQSRQKIRPNRQSIW--PTMCRYPAIYWIQMPRSLKML-------RKDTEDVLESPDFR-----IEGTVKHA-EEMHNVGDGWGNVAISVDYHAIF--------RVVRTNFLQAHM-------KS------WCYT--------FR--IWRHEMYPASRMIDWFFRGIVGVLEFF--RSD----EWRGQWACNRVYTIMM------TMGKPEMGATRYFIWHCEWVITKYIQ-----REDWACAHKCCRWERKQVCH----FFHPQTC----HDQWCVCWYGQVKRMVYTWYHNGLHWNKCCNVNIYWLSWRNHGPAMAQCEAQIHS----HTRLCDYKRRSWFW---------YAACPEADIGIMQDSGW---TL---CVFMRCMWYVMCVPCWSR-HVCFPDIA------ATSYPYFNCWAQVMWWGPHQNCQRKNMWQEDYQEFIQINK--LLKASDHYLDAHKHRCQFAAVLMWQCPAMVVMRMTMYNQIVLSPARMDIFTWTELRQTFRTNDL-CVHPQMGPWRPCLIG-----VVYAECFFGSMLT--------LLGDEKRGGLRGWHHWDCGHCYVMGAHPHAWAQCLVMFMYPFL---EQWTDLFGSYQPEDVNSMLYHRFMGE-----YARSQKILR----KAPQTHILNAF---------NTREH---LQAW--PEMDFERYWAGAENFQSGMIMQYWPHGYMDYQWQMH-------IPWAQSWANFKMYLHPCWT---------KTCQGV------FEYLMAGWQ----CCKMKSF---WTS-QTFDY-QSMMCCCCGNPVR--TVAE-MHNVAD-KFTKRHMHYGWYHTPDMGYHMPAFTKGTPVYRSPKATLHTESDMITAQNVNW------WVYQWGHWPFFRITVDEQHV')