Define the skew of a DNA string $$s$$ — denoted $$\text{skew}(s)$$ — as the difference between the total number of occurrences of the nucleotides G and C in $$s$$. Let $$\text{prefix}_i(s)$$ denote the prefix (i.e., initial substring) of $$s$$ of length $$i$$. For example, if

$$s$$ = CATGGGCATCGGCCATACGCC

the values of $$\text{skew}(\text{prefix}_i(s))$$ are

0 -1 -1 -1 0 1 2 1 1 1 0 1 2 1 0 0 0 0 -1 0 -1 -2

Assignment

Write a function minimum_skew that takes a DNA strings $$s$$. The function must return a tuple containing all integers $$i$$ that minimize $$\text{skew}(\text{prefix}_i(s))$$ over all values of $$i$$ (from 0 to $$|s|$$). The integers in the tuple must be increasing order.

Example

In the following interactive session, we assume the FASTA file data.fna¹ to be located in the current directory.

        >>> minimum_skew('CATGGGCATCGGCCATACGCC')
(21,)

>>> from Bio import SeqIO
>>> minimum_skew(*SeqIO.parse('data.fna', 'fasta'))
(53, 97)