In 1954, James Watson1 (PRO) and Francis Crick2 (TYR) formed the RNA Tie Club3 — a scientific gentleman's club whose mission was to solve the riddle of the RNA structure and to understand how it built proteins. The club had 20 members, each of whom was designated by an amino acid (the building blocks of proteins).
member | training | tie designation |
---|---|---|
George Gamow4 | physicist | ALA |
Alexander Rich5 | biochemist | ARG |
Paul Doty6 | physical chemist | ASP |
Robert Ledley7 | mathematical biophysicist | ASN |
Martynas Ycas8 | biochemist | CYS |
Robley Williams9 | electron microscopist | GLU |
Alexander Dounce10 | biochemist | GLN |
Richard Feynman11 | theoretical physicist | GLY |
Melvin Calvin12 | chemist | HIS |
Norman Simmons13 | biochemist | ISO |
Edward Teller14 | physicist | LEU |
Erwin Chargaff15 | biochemist | LYS |
Nicholas Metropolis16 | physicist, mathematician | MET |
Gunther Stent17 | physical chemist | PHE |
James Watson18 | biologist | PRO |
Harold Gordon19 | biologist | SER |
Leslie Orgel20 | theoretical chemist | THR |
Max Delbrück21 | theoretical physicist | TRY |
Francis Crick22 | biologist | TYR |
Sydney Brenner23 | biologist | VAL |
In his memoires George Gamow24 (ALA) recalled:
We were just drinking California wine and we got the idea.
Each member was given a black woolen necktie with an RNA felix embroidered in green and yellow (photograph, left to right: Francis Crick25 (TYR), Alexander Rich26 (ARG), Leslie Orgel27 (THR) and James Watson28 (PRO)).
Each member also received a gold tiepin with the three-letter abbreviation of his amino acid, which led several people to ask George Gamow29 (ALA) why his pin bore the wrong monogram.
Adopting the motto "Do or die, or don't try" they met twice a year to share ideas, cigars and alcohol. Several members of the RNA Tie Club went on to become Nobel Prize laureates, but if fell to Marshall Nirenberg30 — a non-member — to finally decipher the genetic code that forms the link between nucleic and amino acids.
We work with a secret code that the members of a club can use to exchange messages that are gibberish to non-members. The club members are registered in a comma-separated values31 (CSV) file whose first column contains the name of a club member and whose third column contains a designation that includes one or more uppercase letters. Each club member has a unique designation.
A secret message is represented by a sequence (list or tuple) of codes, where each code is a string (str) that contains a designation $$d$$ (one or more uppercase letters) that corresponds to a club member, followed by a position $$p$$ (one or more digits). Possible codes are GLU3, GLY2 or ALA10. To decode the secret message, each code must be replaced by the $$p$$-th letter in the name of the club member that corresponds to designation $$d$$. All letters in the decoded message are uppercase. For example, for the members of the RNA Tie Club, the code GLU3 corresponds with the letter B (third letter of Robley Williams), the code GLY2 with the letter I (second letter of Richard Feynman), and ALA10 with the letter O (tenth letter of George Gamow). Your task:
Write a function read_designations that takes the location (str) of a CSV file. The given CSV file must contain the registered club members in the format described above. The function must return a dictionary (dict) that maps all designations (str) from the given CSV file onto the names (str) of the corresponding club members. The names of the club members should be reduced to letters only, and converted to uppercase.
Write a function split_code that takes a code (str) containing a designation (one or more uppercase letters) followed by a position (one or more digits). The function must return a tuple containing the designation (str) and the position (int) as separate elements. If the argument does not represent a valid code, the function must raise an AssertionError with the message invalid code.
Write a function decode that takes two arguments: i) a secret message (str) and ii) the dictionary (dict; as returned by the function read_designations) containing the list of club members that was used to encode the secret message. The function must return the decoded message (str).
In the following interactive session, we assume the CSV file RnaTieClub.csv32 to be located in the current directory.
>>> designation = read_designations('RnaTieClub.csv33')
>>> designation['GLU']
'ROBLEYWILLIAMS'
>>> designation['GLY']
'RICHARDFEYNMAN'
>>> designation['ALA']
'GEORGEGAMOW'
>>> split_code('GLU3')
('GLU', 3)
>>> split_code('GLY2')
('GLY', 2)
>>> split_code('ALA10')
('ALA', 10)
>>> split_code('R2D2')
Traceback (most recent call last):
AssertionError: invalid code
>>> decode(['GLU3', 'GLY2', 'ALA10', 'ASP4', 'ASP6', 'THR9', 'HIS11', 'PHE8', 'PHE4'], designation)
'BIOLOGIST'
>>> decode(('MET14', 'SER1', 'CYS5', 'PRO9', 'LYS4', 'HIS7', 'GLU11', 'GLU14', 'PHE4'), designation)
'PHYSICIST'
>>> decode(['CYS10', 'MET4', 'ARG8', 'ISO4', 'GLU8', 'MET18', 'PHE12'], designation)
'CHEMIST'
>>> decode(['THR9', 'PHE6', 'THR7', 'LEU9', 'GLU2', 'ALA1', 'TYR6', 'GLU14', 'ASP7'], designation)
'GEOLOGIST'
>>> decode(['THR9', 'ARG3', 'MET5', 'ALA5', 'ASN5', 'CYS2', 'MET14', 'GLY4', 'ASN11', 'ASN1'], designation)
'GEOGRAPHER'
>>> decode(['LYS11', 'MET8', 'ASP7', 'PHE7', 'ALA10', 'ISO11', 'ASN2', 'MET9', 'MET10', 'LEU5'], designation)
'ASTRONOMER'
>>> decode(['CYS12', 'PRO8', 'LYS8', 'CYS4', 'MET17', 'ISO12', 'PHE12', 'MET17', 'LYS6', 'MET17', 'PRO2', 'HIS6'], designation)
'STATISTICIAN'
>>> decode(['VAL7', 'ARG11', 'ALA3', 'MET3', 'ARG13', 'ASN11', 'HIS1', 'HIS5', 'PHE8', 'MET11'], designation)
'BIOCHEMIST'
>>> decode(['GLY12', 'ISO5', 'PHE9', 'GLY4', 'ASN8', 'ISO9', 'SER2', 'LEU7', 'LYS4', 'CYS10', 'TYR10', 'ALA8', 'GLN13'], designation)
'MATHEMATICIAN'
>>> decode(['ARG12', 'SER8', 'GLU13', 'ASP1', 'TRY9', 'PHE9', 'THR10', 'LEU12', 'MET8', 'GLN14', 'ISO8', 'ALA6', 'TYR4', 'LEU7', 'HIS5', 'CYS8', 'LEU7'], designation)
'COMPUTERSCIENTIST'