The term DNA sequencing refers to the techniques that are used to determine the order of nucleobases adenine (A), cytosine (C), guanine (G) and thymine (T) for a DNA molecule. The standard method to represent these bases is by using the first letter of their names, ACGT, which allows DNA to be represented as a string that consists of only these four letters. DNA strings are usually some millions of characters long.
Substring matching is the process of determining if a shorter string (a substring) occurs within a longer string. Substring matching has an important role in reconstructing an unknown DNA string of various smaller parts (assemblage1), and searching for interesting substrings (motifs2) within a known DNA string.
Python provides a string method find(substring[,
begin][,
end]) that prints the smallest index (integer) in which the
Write a function
motifs(sequence, subsequence[, begin][, end])
where
>>> motifs('AAA', 'A')
[0, 1, 2]
>>> motifs('AAA', 'A', begin=1)
[1, 2]
>>> motifs('AAA', 'A', end=2)
[0, 1]
>>> motifs('AAA', 'A', begin=1, end=2)
[1]
>>> motifs('AAA', 'AA')
[0, 1]
>>> motifs('AAA', 'C')
[]
>>> motifs('AGGAATGCTCGTAGGATACTGAATGCTCGGACGTACGCT', 'GGA')
[1, 13, 28]