Given a string , its -mer composition composition(,
) is the collection of all -mer substrings of
(including repeated -mers). For example,
>>> composition(3, 'TATGGGGTGC')
['ATG', 'GGG', 'GGG', 'GGT', 'GTG', 'TAT', 'TGC', 'TGG']
Note that we have listed -mers in lexicographic order (i.e.,
how they would appear in a dictionary) rather than in the order of their
appearance in TATGGGGTGC. We have done this because the
correct ordering of the reads is unknown when they are generated.
Assignment
Write a function composition that takes three arguments: i)
an integer , ii) the location of a FASTA
file containing a DNA string and iii) another file location.
The function must generate the -mer composition of strings , and
write the lexicographically ordered -mers in FASTA format to the file
whose location is passed as the third argument.
Example
In the following interactive session, we assume the FASTA files data01.fna and output01.fna to be
located in the current directory.
>>> composition(5, 'data01.fna', 'output01.fna')
>>> print(open('output01.fna').read().rstrip())
>seq01
AATCC
>seq02
ATCCA
>seq03
CAATC
>seq04
CCAAC
>seq05
TCCAA