Organizing strings

When cataloguing a collection of genetic strings, we should have an established system by which to organize them. The standard method is to organize strings as they would appear in a dictionary, so that APPLE precedes APRON, which in turn comes before ARMOR.

Assignment

Assume that an alphabet $$\mathscr{A}$$ has a predetermined order. That is, we write the alphabet as a permutation $$\mathscr{A} = (a_1, a_2, \ldots, a_n)$$, where $$a_1 < a_2 <  \ldots < a_n$$. For instance, the English alphabet is organized as (A, B, …, Z).

Given two strings $$s$$ and $$t$$ having the same length $$k$$, we say that $$s$$ precedes $$t$$ in the lexicographic order (and write $$s <_{\textrm{Lex}}t$$) if the first symbol $$s[j]$$ that doesn't match $$t[j]$$ satisfies $$s_j < t_j$$ in $$\mathscr{A}$$.

Write a function fixedLengthKmers that takes two arguments: i) an ordered collection of symbols defining an ordered alphabet, and ii) a positive integer $$k \in \mathbb{N}$$. The function must return a list containing all strings of length $$k$$ that can be formed from symbols of the given alphabet, ordered lexicographically. If the given alphabet is not a permutation of different symbols, the function must raise an AssertionError with the message invalid alphabet.

Example

>>> fixedLengthKmers('TAGC', 2)
['TT', 'TA', 'TG', 'TC', 'AT', 'AA', 'AG', 'AC', 'GT', 'GA', 'GG', 'GC', 'CT', 'CA', 'CG', 'CC']
>>> fixedLengthKmers('XY', 3)
['XXX', 'XXY', 'XYX', 'XYY', 'YXX', 'YXY', 'YYX', 'YYY']
>>> fixedLengthKmers('IOVXG', 2)
['II', 'IO', 'IV', 'IX', 'IG', 'OI', 'OO', 'OV', 'OX', 'OG', 'VI', 'VO', 'VV', 'VX', 'VG', 'XI', 'XO', 'XV', 'XX', 'XG', 'GI', 'GO', 'GV', 'GX', 'GG']

Note on lexicographic order

As illustrated in the examples, the alphabet order in this assignment is defined by the order in which symbols are provided in the given alphabet, which is not necessarily the traditional order of the English alphabet.