Define the skew of a DNA string $$s$$ — denoted $$\text{skew}(s)$$ — as the difference between the total number of occurrences of the nucleotides G and C in $$s$$. Let $$\text{prefix}_i(s)$$ denote the prefix (i.e., initial substring) of $$s$$ of length $$i$$. For example, if
$$s$$ = CATGGGCATCGGCCATACGCC
the values of $$\text{skew}(\text{prefix}_i(s))$$ are
0 -1 -1 -1 0 1 2 1 1 1 0 1 2 1 0 0 0 0 -1 0 -1 -2
In the following interactive session, we assume the FASTA file data.fna1 to be located in the current directory.
>>> minimum_skew('CATGGGCATCGGCCATACGCC') (21,) >>> from Bio import SeqIO >>> minimum_skew(*SeqIO.parse('data.fna', 'fasta')) (53, 97)