The term DNA sequencing refers to the techniques that are used to determine the order of nucleobases adenine (A), cytosine (C), guanine (G) and thymine (T) for a DNA molecule. The standard method to represent these bases is by using the first letter of their names, ACGT, which allows DNA to be represented as a string that consists of only these four letters. DNA strings are usually some millions of characters long.

Substring matching is the process of determining if a shorter string (a substring) occurs within a longer string. Substring matching has an important role in reconstructing an unknown DNA string of various smaller parts (assemblage¹), and searching for interesting substrings (motifs²) within a known DNA string.

Python provides a string method find(substring[, begin][, end]) that prints the smallest index (integer) in which the $$substring$$ can be found within the interval $$begin \leq index < end$$. The parameters $$begin$$ and $$end$$ are optional, and the function prints the value -1 if the substring can't be found withing the given string. However, genome researchers generally want to find all locations in which a substring can be found in a given DNA string. Not only the position of the first appearance.

Assignment

Write a function

motifs(sequence, subsequence[, begin][, end])

where $$begin$$ and $$end$$ are optional arguments with standard values that respectively equal the begin and the end of a string sequence. The values $$begin$$ and $$end$$ must be interpreted in the same way as when slicing strings or lists: \[0 = begin \leq index < end = \text{len(string)}\] The function must print a list of positions in which the subsequence occurs within the given sequence. As positions in the list, the indices of the first letters of the subsequence of the sequence are recorded, to which applies that $$begin\leq index < end$$.

Example

>>> motifs('AAA', 'A')
[0, 1, 2]
>>> motifs('AAA', 'A', begin=1)
[1, 2]
>>> motifs('AAA', 'A', end=2)
[0, 1]
>>> motifs('AAA', 'A', begin=1, end=2)
[1]
>>> motifs('AAA', 'AA')
[0, 1]
>>> motifs('AAA', 'C')
[]
>>> motifs('AGGAATGCTCGTAGGATACTGAATGCTCGGACGTACGCT', 'GGA')
[1, 13, 28]