Organizing strings of different lengths

In "Enumerating k-mers lexicographically1", we introduced the lexicographic order for strings of the same length constructed from some ordered underlying alphabet. However, our experience with dictionaries suggests that we should be able to order strings of different lengths just as easily. That is, we already have an intuitive sense that APPLE comes before APPLET, which comes before ARTS, and so we should be able to apply this intuition toward cataloguing genetic strings of varying lengths.

Assignment

Say that we have strings $$s = s_1s_2\ldots s_m$$ and $$t = t_1t_2\ldots t_n$$ with $$m < n$$. Consider the substring $$t' = t[1:m]$$. We have two cases:

  1. If $$s = t'$$, then we set $$s <_{\textrm{Lex}} t$$ because $$s$$ is shorter than $$t$$ (e.g., APPLE $$<_{\textrm{Lex}}$$ APPLET).

  2. Otherwise, $$s \neq t'$$. We define $$s <_{\textrm{Lex}} t$$ if $$s <_{\textrm{Lex}}t'$$ and define $$s >_{\textrm{Lex}} t$$ if $$s >_{\textrm{Lex}} t'$$ (e.g., APPLET $$<_{\textrm{Lex}}$$ ARTS because APPL $$<_{\textrm{Lex}}$$ ARTS).

Write a function varyingLengthKmers that takes two arguments: i) an ordered collection of symbols defining an ordered alphabet $$\mathscr{A}$$, and ii) a positive integer $$k \in \mathbb{N_0}$$. The function must return a list containing all strings of length at most $$k$$ that can be formed from symbols of $$\mathscr{A}$$, ordered lexicographically. If the given alphabet is not a permutation of different symbols, the function must raise an AssertionError with the message invalid alphabet.

Note: As in "Enumerating k-mers lexicographically2", alphabet order is based on the order in which the symbols are given.

Example

>>> varyingLengthKmers('DNA', 3)
['D', 'DD', 'DDD', 'DDN', 'DDA', 'DN', 'DND', 'DNN', 'DNA', 'DA', 'DAD', 'DAN', 'DAA', 'N', 'ND', 'NDD', 'NDN', 'NDA', 'NN', 'NND', 'NNN', 'NNA', 'NA', 'NAD', 'NAN', 'NAA', 'A', 'AD', 'ADD', 'ADN', 'ADA', 'AN', 'AND', 'ANN', 'ANA', 'AA', 'AAD', 'AAN', 'AAA']

Extra information

We can combine conditions (1) and (2) above into a single condition by adding a blank character $$\varepsilon$$ to the beginning of our ordered alphabet. Then, if $$s$$ is shorter than $$t$$, we simply add as many instances of $$\varepsilon$$ as necessary to make $$s$$ and $$s$$ the same length.