In 1984, US Secretary of Health and Human Services1 (HHS) Margarat Heckler2 declared that an HIV vaccine would be available within two years, stating
Yet another terrible disease is about to yield to patience, persistence and outright genius.
In 1997, Bill Clinton established a new research center at the National Institutes of Health3 with the goal of developing an HIV vaccine. In his words
It is no longer a question of whether we can develop an AIDS vaccine, it is simply a question of when.
In 2005, Merck began clinical trials of an HIV vaccine but discontinued them two years later after learning that the vaccine actually increased the risk of HIV infection in some recipients.
Today, despite those enormous investments and ongoing clinical trials, we are still far from an HIV vaccine, and 35 million people are living with the disease. Scientists have made great progress in developing a successful antiretroviral therapy4, a drug cocktail that stabilizes an infected patient's symptoms. However, this therapy does not cure AIDS and cannot prevent the spread of HIV, and so it does not hold the promise of a true vaccine for containing the AIDS epidemic.
Since HIV mutates so fast, different HIV isolates may have different phenotypes, thus requiring different drug cocktails. For example, HIV viruses can be divided into fast-replicating syncytium-inducing (SI) isolates and slow-replicating non-syncytium-inducing (NSI) isolates. During infection, viral proteins like gp1205 — used by HIV to enter the cell — are transported to the cell surface, where they can cause the host cell membrane to fuse with neighboring cells. This causes dozens of human cells to fuse their cell membranes into a giant, nonfunctional syncytium6 or abnormal multinucleate cell (see figure). This mechanism also allows an SI virus to kill many human cells by infecting only one.
Because gp120 is important in classifying a virus as SI or NSI, biologists are interested in determining which amino acids in gp120 can be used for this classification. In 1992, Fouchier et al. analyzed a multiple alignment of the V3 loop region in gp120 and devised the 11/25 rule, which asserts that an HIV strain is more likely to have an SI phenotype if the amino acid at either positions 11 or 25 of its V3 loop is arginine (R) or lysine (K).
CMRPGNNTRKSIHMGPGKAFYATGDIIGDIRQAHC CMRPGNNTRKSIHMGPGRAFYATGDIIGDTRQAHC CMRPGNNTRKSIHIGPGRAFYATGDIIGDIRQAHC CMRPGNNTRKSIHIGPGRAFYTTGDIIGDIRQAHC CTRPNNNTRKGISIGPGRAFIAARKIIGDIRQAHC CTRPNNYTRKGISIGPGRAFIAARKIIGDIRQAHC CTRPNNNTRKRIRMGPGRAFIAARKIIGDIRQAHC CVRPNNYTRKRIGIGPGRTVFATKQIIGNIRQAHC CTRPSNNTRKSIPVGPGKALYATGAIIGNIRQAHC CTRPNNHTRKSINIGPGRAFYATGEIIGDIRQAHC CTRPNNNTRKSINIGPGRAFYATGEIIGDIRQAHC CTRPNNNTRKSIHIGPGRAFYTTGEIIGDIRQAHC CTRPNNNTRKSINIGPGRAFYTTGEIIGNIRQAHC CIRPNNNTRGSIHIGPGRAFYATGDIIGEIRKAHC CIRPNN-TRRSIHIGPGRAFYATGDIIGEIRKAHC CTRPGSTTRRHIHIGPGRAFYATGNILGSIRKAHC CTRPGSTTRRHIHIGPGRAFYATGNI-GSIRKAHC CTGPGSTTRRHIHIGPGRAFYATGNIHG-IRKGHC CMRPGNNTRRRIHIGPGRAFYATGNI-GNIRKQHC CMRPGTTTRRRIHIGPGRAFYATGNI-GNIRKAHC
A protein is represented as a sequence of uppercase letters, with the uppercase letters corresponding to the individual amino acids of the protein.
A variant of a protein is represented as a string that consists of the digits of a number $$p \in \mathbb{N}_0$$ followed by one or more uppercase letters. The number $$p$$ indicates the position of the variant in the protein sequence, with the positions of the amino acids numbered from left to right starting from 1. The variant occurs in the protein if the amino acid at the given position $$p$$ corresponds to one of the uppercase letters in the description of the variant. As such, the variant 11KR occurs in a protein if its eleventh amino acid is lysine (K) or arginine (R).
The first three lines of the input contain the following information:
the amino acid sequence of a protein $$e$$
a number $$m \in \mathbb{N}_0$$
a number $$n \in \mathbb{N}_0$$ ($$n \geq m$$)
This is followed by another $$n$$ lines that each contain the description of a single variant in the protein. All positions $$p$$ of these variants are different.
Output a diagnosis based on the number of variants that occur in the given protein $$e$$. The diagnosis is positive if at least $$m$$ variants occur in the protein $$e$$, otherwise the diagnosis is negative. After the diagnosis follows a space and in between round brackets the number of variants that occur in the protein $$e$$.
Input:
CMRPGNNTRKSTHMGPGKAFYAICDTIGDIRGAHC
1
2
11KR
25KR
Output:
negative (0)
Input:
CVRPNNYTRKRIGIGPGRTVFATKQIIGNTRQAHC 1 2 11KR 25KR
Output:
positive (1)
Input:
CTRPNNNTRKRTSIIGPGRAFTAARKTIGDIRQAHC 1 2 11KR 25KR
Output:
positive (2)