In dealing with global alignment in "Global alignment with scoring matrix1", we encountered a linear gap penalty, in which the insertion or deletion of a gap is penalized by some constant times the length of the gap. However, this model is not necessarily the most practical model, as one large rearrangement could have inserted or deleted a long gap in a single step to transform one genetic string into another.
In a constant gap penalty, every gap receives some predetermined constant penalty, regardless of its length. Thus, the insertion or deletion of 1000 contiguous symbols is penalized equally to that of a single symbol. Your task:
Write a function globalAlignmentScore that takes two protein strings $$s$$ and $$t$$. The function must return the global alignment score between $$s$$ and $$t$$, using the BLOSUM62 scoring matrix and a constant gap penalty equal to 5.
Write a function globalAlignment that takes two protein strings $$s$$ and $$t$$. The function must return a tuple containing two augmented strings $$s'$$ and $$t'$$ representing an optimal alignment of $$s$$ and $$t$$. To align the two given DNA strings, the function must use the BLOSUM62 scoring matrix and a constant gap penalty equal to 5.
In the following interactive session, we assume the FASTA file data.faa2 to be located in the current directory.
>>> from Bio import SeqIO >>> globalAlignmentScore('PLEASANTLY', 'MEANLY') 13 >>> globalAlignmentScore(*SeqIO.parse('data.faa', 'fasta')) 2793 >>> globalAlignment('PLEASANTLY', 'MEANLY') ('PLEASANTLY', '-MEA--N-LY') >>> globalAlignment(*SeqIO.parse('data.faa', 'fasta')) ('SFLEDSNLKIRPNRQSSWVGPTMCS--AIYWIQMTRSLKMLVPEHNAQQRDTEDVLESPDFKWWWCLIEGTVKHAMKKMETPP------------HAIFMSPWYCGLRVVRTNFL---MDFKDFMYESMLCEYPWCYTIDMVPNMVFRNQIWRHEMYPAK----WFFRGIVGRLEFFECRSIYNVVQW-G-WL-SRS-TLMMRVQVPETMGKPEMGATVYFIIHCEWVITAWLQYKEKQREDEACAHKCCRWERKQMCLNQELFFHPQTCLALAHDQWCVCWYGQVKRMVYTWVH--------CNVQWYWLSHRNGGPAMAQCEAQIHSVSMMHTRLCDYKRRSWFWQWVCACQVEYAACPEADSGFMQDSGWLDKTLSDKCV--DCMWYAMCVPCISRMHVCFP--AKLLQGHAQSYPYFNCWAQVMWWA-------KNMIQ-----FIQ-NWTCMMKASDHYLDAGKHRPQFAAVLN-QCPTVVVQ------QIVLSPARMDIFTWTELTQT-----LYCVTPQMGPWRPCLIGNCNVLVVYMECFFGPMLTMMFSEPPQLLGDEKRGGLRG------NHCYFMGAHPQADVQNLVMFEYPFLYYREQWTDLFGSYQPEDYNSMLYHRFMDDCIMYHYG----ILRNNCLKAG---ILNAFVKPRQNIRNNTREHCFALQAWATPEMDFERYWAGAENFQSGMIMQYWPHGYMDYQWQMHKAEYPNWIPWAQSWA--LQYLHPCWTFYDNVIFNIKTCQGVWKYYYTFEYLMAGIQQQGESSKMKSFYYKWTNLERMEIEQSMMCCCCGRNLRAGTVAEPMHNVADYKFTE--MHYGWYHTPDIGYHVRAFEKGTPVYRSPKATLHTE----TME-VNWDVCAWPW-YQW-------LTVDEQHV', 'SFLVQSRQKIRPNRQSIW--PTMCRYPAIYWIQMPRSLKML-------RKDTEDVLESPDFR-----IEGTVKHA-EEMHNVGDGWGNVAISVDYHAIF--------RVVRTNFLQAHM-------KS------WCYT--------FR--IWRHEMYPASRMIDWFFRGIVGVLEFF--RSD----EWRGQWACNRVYTIMM------TMGKPEMGATRYFIWHCEWVITKYIQ-----REDWACAHKCCRWERKQVCH----FFHPQTC----HDQWCVCWYGQVKRMVYTWYHNGLHWNKCCNVNIYWLSWRNHGPAMAQCEAQIHS----HTRLCDYKRRSWFW---------YAACPEADIGIMQDSGW---TL---CVFMRCMWYVMCVPCWSR-HVCFPDIA------ATSYPYFNCWAQVMWWGPHQNCQRKNMWQEDYQEFIQINK--LLKASDHYLDAHKHRCQFAAVLMWQCPAMVVMRMTMYNQIVLSPARMDIFTWTELRQTFRTNDL-CVHPQMGPWRPCLIG-----VVYAECFFGSMLT--------LLGDEKRGGLRGWHHWDCGHCYVMGAHPHAWAQCLVMFMYPFL---EQWTDLFGSYQPEDVNSMLYHRFMGE-----YARSQKILR----KAPQTHILNAF---------NTREH---LQAW--PEMDFERYWAGAENFQSGMIMQYWPHGYMDYQWQMH-------IPWAQSWANFKMYLHPCWT---------KTCQGV------FEYLMAGWQ----CCKMKSF---WTS-QTFDY-QSMMCCCCGNPVR--TVAE-MHNVAD-KFTKRHMHYGWYHTPDMGYHMPAFTKGTPVYRSPKATLHTESDMITAQNVNW------WVYQWGHWPFFRITVDEQHV')