If you implemented BWMATCHING in “BWMatching”, you probably found the algorithm to be slow. The reason for its sluggishness is that updating the pointers top and bottom is time-intensive, since it requires examining every symbol in LastColumn between top and bottom at each step.
To improve BWMATCHING, we introduce a function Countsymbol(i, LastColumn), which returns the number of occurrences of symbol in the first i positions of LastColumn. For example, Count"n”(10, "smnpbnnaaaaa$a”) = 3, and Count"a”(4, "smnpbnnaaaaa$a”) = 0.
Furthermore, we introduce a function FirstOccurrence(symbol, FirstColumn) which returns the position of the first occurrence of symbol in FirstColumn.
The green lines from BWMATCHING can now be compactly described by the following two lines:
Implement the better_bwt_matching method which takes a FASTA filename, containing a string containing a '$' and a couple of string patterns (without a '$'!).
The function should return a list of integers, where the i-th integer corresponds to the number of substring matches of the i-th member of Patterns in Text.
>>> better_bwt_matching('data01.fna'1) (0, 1, 0, 0, 0, 1, 0, 0, 0, 0) >>> better_bwt_matching('data02.fna'2) (1, 0, 0, 1, 0, 0, 1, 0, 2, 0, 0, 1, 1)