Two motifs, one gene

Recall that in "Finding a shared spliced motif1" we found the longest motif that could have been shared by two genetic strings, allowing for the motif to be split onto multiple exons in the process. As a result, we needed to find a longest common subsequence of the two strings (which extended the problem of finding a longest common substring from "Finding a shared motif2").

In this assignment, we consider an inverse problem of sorts in which we are given two different motifs and we wish to interleave them together in such a way as to produce a shortest possible genetic string in which both motifs occur as subsequences.

Assignment

A string $$s$$ is a supersequence of another string $$t$$ if $$s$$ contains $$t$$ as a subsequence.

A common supersequence of strings $$s$$ and $$t$$ is a string that serves as a supersequence of both $$s$$ and $$t$$. For example, GACCTAGGAACTC serves as a common supersequence of ACGTC and ATAT. A shortest common supersequence of $$s$$ and $$t$$ is a supersequence for which there does not exist a shorter common supersequence. Continuing our example, ACGTACT is a shortest common supersequence of ACGTC and ATAT.

Write a function shortestCommonSupersequence that takes two DNA strings $$s$$ and $$t$$. The function must return a shortest common supersequence of $$s$$ and $$t$$. If multiple solutions exist, the function may return any one.

Example

In the following interactive session, we assume the FASTA file data.fna3 to be located in the current directory.

>>> shortestCommonSupersequence('ATCTGAT', 'TGCATA')
'ATGCATGAT'

>>> from Bio import SeqIO
>>> shortestCommonSupersequence(*SeqIO.parse('data.fna', 'fasta'))
'CGCCCGTAGTTGGGTTACTACACACTCTCGTCCAATTACGGGTAATAATGTGTTACGCTCTCATGTTCATGGACATTGTCTAACCCCTGGTTGCTAGTCACCGCTAGGGCTTGGCGAGCTGATCCCCAAAGGA'