If you implemented BWMATCHING in “BWMatching”, you probably found the algorithm to be slow. The reason for its sluggishness is that updating the pointers top and bottom is time-intensive, since it requires examining every symbol in LastColumn between top and bottom at each step.

To improve BWMATCHING, we introduce a function Count_symbol(i, LastColumn), which returns the number of occurrences of symbol in the first i positions of LastColumn. For example, Count_"n”(10, "smnpbnnaaaaa$a”) = 3, and Count_"a”(4, "smnpbnnaaaaa$a”) = 0.

Furthermore, we introduce a function FirstOccurrence(symbol, FirstColumn) which returns the position of the first occurrence of symbol in FirstColumn.

The green lines from BWMATCHING can now be compactly described by the following two lines:

top ← FirstOccurence(symbol) + Count_symbol(top, LastColumn)
bottom ← FirstOccurence(symbol) + Count_symbol(bottom + 1, LastColumn) - 1

Assignment

Implement the better_bwt_matching method which takes a FASTA filename, containing a string containing a '$' and a couple of string patterns (without a '$'!).

The function should return a list of integers, where the i-th integer corresponds to the number of substring matches of the i-th member of Patterns in Text.

Example

>>> better_bwt_matching('data01.fna'1)
(0, 1, 0, 0, 0, 1, 0, 0, 0, 0)

>>> better_bwt_matching('data02.fna'2)
(1, 0, 0, 1, 0, 0, 1, 0, 2, 0, 0, 1, 1)