In this problem, we ask a simple question: how many times can one
string
occur as a
substring of another? Recall from "
Most frequent words" that different occurrences
of a substring can overlap with each other. For example,
ATA
occurs three times in
CGATATATCCATAG.
Assignment
Write a function
pattern_occurrences that takes two DNA
strings $$p$$ and $$s$$. The function must return a tuple containing all
starting positions in $$s$$ where $$p$$ appears a a substring. Use
0-based
indexing.
Example
In the following interactive session, we assume the FASTA file data.fna to be located in
the current directory.
>>> pattern_occurrences('ATA', 'CGATATATCCATAG')
(2, 4, 10)
>>> pattern_occurrences('ATAT', 'GATATATGCATATACTT')
(1, 3, 9)
>>> from Bio import SeqIO
>>> pattern_occurrences(*SeqIO.parse('data.fna', 'fasta'))
(0, 46, 51, 74)