Given a string $$s$$, its $$k$$-mer composition composition($$k$$, $$s$$) is the collection of all $$k$$-mer substrings of $$s$$ (including repeated $$k$$-mers). For example,
>>> composition(3, 'TATGGGGTGC') ['ATG', 'GGG', 'GGG', 'GGT', 'GTG', 'TAT', 'TGC', 'TGG']
Note that we have listed $$k$$-mers in lexicographic order (i.e., how they would appear in a dictionary) rather than in the order of their appearance in TATGGGGTGC. We have done this because the correct ordering of the reads is unknown when they are generated.
Write a function composition that takes three arguments: i) an integer $$k \in \mathbb{N_0}$$, ii) the location of a FASTA file containing a DNA string $$s$$ and iii) another file location. The function must generate the $$k$$-mer composition of strings $$s$$, and write the lexicographically ordered $$k$$-mers in FASTA format to the file whose location is passed as the third argument.
In the following interactive session, we assume the FASTA files data01.fna and output01.fna to be located in the current directory.
>>> composition(5, 'data01.fna', 'output01.fna') >>> print(open('output01.fna').read().rstrip()) >seq01 AATCC >seq02 ATCCA >seq03 CAATC >seq04 CCAAC >seq05 TCCAA