You have set up a laboratory experiment where you make use of a particle accelerator for bombarding bismuth atoms with zinc. The atomic collisions generated by the experiment result in a large number of chemical elements. Since all of these elements remained unknown until today, this discovery will undoubtedly yield the Nobel Prize in Chemistry. In order to publish your results, you still need to come up with new names for each of the new elements.
Because you want to assign names that are in line with the names of existing chemical elements, you decide to proceed in the following way. Chemical names are written with a capital letter, followed by two or more lowercase letters. So you stick to this usage of uppercase and lowercase letters. To recognize suffixes of element names, you (temporarily) append an underscore (_) to the names of the existing chemical elements. Then, you proceed as follows:
Randomly pick a name of an existing chemical element, and take the first three letters of that name as the initial letters of the new name you are going to construct.
Take the last two letters of the provisional new name, and search the names of the existing chemical elements for all possible characters (lowercase letters or underscores) that follow this bigram. Randomly pick a character from the list of candidates, and append it to the provisional new name.
If you have chosen an underscore during step 2, the new name is considered to be complete. Of course, in that case the underscore must be removed from the name. Otherwise, keep on repeating step 2 until an underscore was chosen.
If the above procedure yields a name that was already assigned to an existing chemical element, you simply repeat the whole process until it produces a new name.
Define a class NameGenerator that can be used to generate new names based on a sequence of example names, following the procedure outlined in the introduction. The objects of this class should at least have the following properties and methods:
Each object of the class NameGenerator must have properties prefixes and triples. Upon creation of a new object these properties must respectively reference an empty set and an empty dictionary. The following two methods will be used to modify the content of these properties.
A method add_name that can be used to add a new example name to the generator. This example name must be passed as an argument to the generator. The method must add a string containing the first three letters of the example name to the set of prefixes (property prefixes), and must update the dictionary of triples (property triples). The latter is done by looking up each successive pair of lowercase letters as a key in the dictionary, and adding the letter following the pair in the example name (or an underscore for the final pair of letters) to the set that is mapped to the key by the dictionary. If the pair of letters did not yet occur as a key in the dictionary, a new key/value pair must be added to the dictionary, with the value being a set containing the letter following the pair of letters in the example name. The method must raise an AssertionError with the message invalid name if the example name that is passed as an argument does not exists of a capital letter followed by two or more lowercase letters.
A method add_names that takes the location of a text file as an argument. This text file must contain a list of names, each on a separate line. The method must use each of these names to update the properties prefixes and triples, following the description given for the method add_names.
A method name that takes no arguments. This method should return a new name that was generated following the procedure outlined in the introduction. Of course, step 1 of the procedure should make use of the property prefixes, and step 2 should make use of the property triples. The example names that were passed when calling the method add_names or that are contained in a text file that was passed when calling the method add_names, are not considered to be new names and thus are not allowed to be returned by this method.
In implementing these methods you should make sure to make optimal reuse of the methods that have already been implemented.
In the following interactive session we assume that the file shortlist_elements.txt1 is located in the current directory.
>>> chemGen = NameGenerator()
>>> chemGen.add_name('Osmium')
>>> chemGen.prefixes
{'Osm'}
>>> chemGen.triples
{'sm': {'i'}, 'mi': {'u'}, 'iu': {'m'}, 'um': {'_'}}
>>> chemGen.add_name('bismuth')
Traceback (most recent call last):
AssertionError: invalid name
>>> chemGen.add_name('zINC')
Traceback (most recent call last):
AssertionError: invalid name
>>> chemGen.add_name('pH')
Traceback (most recent call last):
AssertionError: invalid name
>>> chemGen.add_name('Bismuth')
>>> chemGen.prefixes
{'Osm', 'Bis'}
>>> chemGen.triples
{'sm': {'u', 'i'}, 'mi': {'u'}, 'iu': {'m'}, 'um': {'_'}, 'is': {'m'}, 'mu': {'t'}, 'ut': {'h'}, 'th': {'_'}}
>>> chemGen.add_names('shortlist_elements.txt')
>>> chemGen.prefixes
{'Tha', 'Tel', 'Lan', 'Rut', 'Plu', 'Unu', 'Osm', 'Bis'}
>>> chemGen.triples
{'sm': {'u', 'i'}, 'mi': {'u'}, 'iu': {'m'}, 'um': {'_'}, 'is': {'m'}, 'mu': {'t'}, 'ut': {'o', 'h'}, 'th': {'e', '_', 'a'}, 'he': {'n', 'r', 'x'}, 'en': {'i'}, 'ni': {'u'}, 'an': {'u', 't'}, 'nt': {'h'}, 'ha': {'n', 'l'}, 'nu': {'n', 'm'}, 'al': {'l'}, 'll': {'u', 'i'}, 'li': {'u'}, 'el': {'l'}, 'lu': {'r', 't'}, 'ur': {'i'}, 'ri': {'u'}, 'to': {'n'}, 'on': {'i'}, 'er': {'f'}, 'rf': {'o'}, 'fo': {'r'}, 'or': {'d'}, 'rd': {'i'}, 'di': {'u'}, 'un': {'h'}, 'nh': {'e'}, 'ex': {'i'}, 'xi': {'u'}}
>>> chemGen.name()
'Osmuthalluthexium'
>>> chemGen.name()
'Ruthanthanium'
>>> chemGen.name()
'Lantherfordium'
>>> chemGen.name()
'Thanthenium'