Comparing multiple strings simultaneously

In "Consensus and profile1" we generalized the notion of Hamming distance to find an average case for a collection of nucleic acids or peptides. However, this method only worked if the polymers had the same length. As we have already noted in "Edit distance2", homologous strands of DNA have varying lengths because of the effect of mutations inserting and deleting intervals of genetic material. As a result, we need to generalize the notion of alignment to cover multiple strings.

Assignment

A multiple alignment of a collection of three or more strings is formed by adding gap symbols to the strings to produce a collection of augmented strings all having the same length.

A multiple alignment score is obtained by taking the sum of an alignment score over all possible pairs of augmented strings. The only difference in scoring the alignment of two strings is that two gap symbols may be aligned for a given pair (requiring us to specify a score for matched gap symbols).

Your task:

Example

In the following interactive session, we assume the FASTA file data.fna3 to be located in the current directory.

>>> from Bio import SeqIO

>>> multipleAlignmentScore('ATATCCG', 'TCCG', 'ATGTACTG', 'ATGTCTG')
-18
>>> multipleAlignmentScore(*SeqIO.parse('data.fna', 'fasta'))
-35

>>> multipleAlignment('ATATCCG', 'TCCG', 'ATGTACTG', 'ATGTCTG')
('ATAT-CCG', '-T---CCG', 'ATGTACTG', 'ATGT-CTG')
>>> multipleAlignment(*SeqIO.parse('data.fna', 'fasta'))
('-CGTCCATG-', 'GAATAGG-GT', 'ACATAGGGG-', 'CCAGCTG-G-')