The keypad of a mobile phone consists of 12 keys: numbered keys 0–9 and two additional keys (* and #). To enter text on such a keypad, the letters A–Z are spread over keys 2–9 in alphabetic order. The SPACE character is often assigned to the 0-key, but this varies depending on the phone. As the alphabet has 26 letters, three or four letters are grouped on each key, and, so, ambiguity arises.

The 12-key keypad of a mobile phone as used by LetterWise has a special NEXT-key at the bottom right of the keypad.

LetterWise¹ is a predictive text entry system that was developed to overcome this ambiguity. Unlike other predictive text entry systems such as Multitap² or T9³, LetterWise does not depend on a built-in dictionary, allowing the user to type anything, yet with very high efficiency. Being a very simple system to use, its instruction manual is only one sentence:

Hit the key with the letter you want and if it doesn't come up, hit Next until it does.

LetterWise works with a stored database of probabilities of prefixes to guess the letter that you want to enter. A prefix is the letters preceding the current keystroke. For example, if the user presses 3 with the prefix th, the most likely next letter is e because the in English is far more probable than either thd or thf. LetterWise occasionally guesses the wrong letter, and in these cases the user must press a special NEXT-key to choose the next mostly likely letter for the given key and prefix.

LetterWise uses a separate database for each language (English, Dutch, …), which maps prefixes of zero, one, two and three letters onto an alphabet. Such an alphabet contains all 26 uppercase letters according to decreasing probability of occurrence after the corresponding prefix. For example, in the table below you see some mappings from the English database.

prefix	alphabet
	`EIASNORTLCUPDMHGYBFVKWZXQJ`
`W`	`AIEOHNRSLDBKTYMFUPWCGZJQVX`
`WA`	`RYTLNSIGBKDVMXUFCPHWEZAJOQ`
`WAT`	`ECTHASUFINORLPDMGYBVKWZXQJ`
`ATE`	`DSRLNCGMEAFUWTBIOHPYXKVZQJ`

Only the most common prefixes are included in the database, as a way to save memory. However, it is guaranteed that the database always maps the empty prefix to an alphabet (first row in the above table). If a prefix does not appear in the database, by definition it is linked to the alphabet corresponding to the empty prefix.

To enter the letter R after the letters WATE have been entered, LetterWise requires you to enter the 7-key followed by the NEXT-key.

For example, say we want to enter the word WATER and we have already entered the prefix WATE. LetterWise uses at most three letters at the end of the prefix (ATE) to lookup the corresponding alphabet in the database (DSRLNCGMEAFUWTBIOHPYXKVZQJ). To enter the letter R, we tap the 7-key that is marked with the group of letters PQRS. From the alphabet that corresponds to the prefix ATE, LetterWise guesses that we intend to enter the letter S because out of the four letters on the 7-key it is the one that appears first in the alphabet. Therefore, we also have to tap the NEXT-key in this case to select the next letter on the 7-key from the alphabet. That turns out to be the letter R.

Entering a letter with LetterWise therefore comes down to tapping a digit-key, followed by zero or more taps on the NEXT-key. Such a combination of keys that must be tapped to enter a single letter is represented as a string (str) that starts with a digit (corresponding to the digit-key containing a group of letters), followed by zero or more repetitions of the letter N (NEXT-key). As a result, these are the combinations that must be tapped to enter the word WATER when the English version of the LetterWise database is used:

prefix	combination	alphabet
	`9N`	`EIASNORTLCUPDMHGYBFVKWZXQJ`
`W`	`2`	`AIEOHNRSLDBKTYMFUPWCGZJQVX`
`WA`	`8`	`RYTLNSIGBKDVMXUFCPHWEZAJOQ`
`WAT`	`3`	`ECTHASUFINORLPDMGYBVKWZXQJ`
`WATE`	`7N`	`DSRLNCGMEAFUWTBIOHPYXKVZQJ`

Finally, we note that when using the NEXT-key to run through the letters grouped on a digit-key, the last letter of the group is again followed by the first letter of the group. For example, for prefix WATE we get the letter S with combination 7, the letter R with combination 7N, the letter P with combination 7NN, the letter Q with combination 7NNN, and again the letter S with combination 7NNNN.

Assignment

The LetterWise database for a particular language is stored in a text file, with each line containing a prefix, followed by a comma (,) and the alphabet that corresponds to the prefix. The example below shows a selection of the lines from the English version of the LetterWise database (en.txt⁴).

,EIASNORTLCUPDMHGYBFVKWZXQJ
A,TLNRCSBMPDGIUVEKYFWZHXOQJA
AA,LRMSNTDGPIHBFKCEAUWZOYVXQJ
AAB,ADEISNORTLCUPMHGYBFVKWZXQJ
AAD,EIZASNORTLCUPDMHGYBFVKWXQJ
…
ATD,ORAEISNTLCUPDMHGYBFVKWZXQJ
ATE,DSRLNCGMEAFUWTBIOHPYXKVZQJ
ATF,OIUAELSNRTCPDMHGYBFVKWZXQJ
…
VZO,NKEIASORTLCUPDMHGYBFVWZXQJ
W,AIEOHNRSLDBKTYMFUPWCGZJQVX
WA,RYTLNSIGBKDVMXUFCPHWEZAJOQ
WAA,RCPGEIASNOTLUDMHYBFVKWZXQJ
WAB,LBISEARTNOCUPDMHGYFVKWZXQJ
…
WAS,HTPSENOAKIRLCUDMGYBFVWZXQJ
WAT,ECTHASUFINORLPDMGYBVKWZXQJ
WAU,LKCGRNBFSPMVEIAOTUDHYWZXQJ
…

Your task:

Write a function read_database that takes the location of a text file (str) containing the LetterWise database for a particular language. The function must return a dictionary (dict) that maps each prefix (str) from the database onto the corresponding alphabet (str).
Write a function combinations that takes the concatenated combinations (str) that must be tapped to enter the individual letters of a word using LetterWise. The function must return a list (list) containing the individual combinations (str).
Write a function letter that takes three arguments: i) a prefix (str), ii) a combination (str) of keys that are tapped to enter a single letter with LetterWise and iii) a dictionary (dict) that maps all prefixes (str) from a LetterWise database to their corresponding alphabets (str). The function must return the uppercase letter (str) that LetterWise assigns based on the given prefix, combination and database. We want to underscore once more that LetterWise only uses the last three letters if the given prefix contains more than three letters, and that the prefix is mapped to the alphabet corresponding to the empty prefix if the prefix does not appear in the database.
Write a function word that takes two arguments: i) the concatenated combinations (str) that must be tapped to enter the individual letters of a word using LetterWise and ii) a dictionary (dict) that maps all prefixes (str) from a LetterWise database onto their corresponding alphabets (str). The function must return the word (str, uppercase) that LetterWise assigns based on the given key combinations and the given database.

Example

In the following interactive session we assume the text files en.txt⁵ and nl.txt⁶ to be located in the current directory.

        >>> database = read_database('en.txt7')
>>> database['']
'EIASNORTLCUPDMHGYBFVKWZXQJ'
>>> database['W']
'AIEOHNRSLDBKTYMFUPWCGZJQVX'
>>> database['WA']
'RYTLNSIGBKDVMXUFCPHWEZAJOQ'
>>> database['WAT']
'ECTHASUFINORLPDMGYBVKWZXQJ'
>>> database['ATE']
'DSRLNCGMEAFUWTBIOHPYXKVZQJ'
>>> database['AAA']
Traceback (most recent call last):
KeyError: 'AAA'

>>> letter('', '9', database)
'Y'
>>> letter('', '9N', database)
'W'
>>> letter('W', '2', database)
'A'
>>> letter('WA', '8', database)
'T'
>>> letter('WAT', '3', database)
'E'
>>> letter('WATE', '7', database)
'S'
>>> letter('WATE', '7N', database)
'R'
>>> letter('WATE', '7NN', database)
'P'
>>> letter('WATE', '7NNN', database)
'Q'
>>> letter('WATE', '7NNNN', database)
'S'

>>> combinations('9N2837N')
['9N', '2', '8', '3', '7N']
>>> combinations('4N25837N83553')
['4N', '2', '5', '8', '3', '7N', '8', '3', '5', '5', '3']
>>> combinations('4N65N524N36')
['4N', '6', '5N', '5', '2', '4N', '3', '6']
>>> combinations('92837')
['9', '2', '8', '3', '7']
>>> combinations('4NN25837N83553')
['4NN', '2', '5', '8', '3', '7N', '8', '3', '5', '5', '3']
>>> combinations('4NN65N524N3N6')
['4NN', '6', '5N', '5', '2', '4N', '3N', '6']

>>> word('9N2837N', database)
'WATER'
>>> word('4N25837N83553', database)
'HALTESTELLE'
>>> word('4N65N524N36', database)
'HOKKAIDO'

>>> database = read_database('nl.txt8')
>>> word('92837', database)
'WATER'
>>> word('4NN25837N83553', database)
'HALTESTELLE'
>>> word('4NN65N524N3N6', database)
'HOKKAIDO'

Resources

MacKenzie IS, Kober H, Smith D, Jones T, Skepner E (2001). LetterWise: prefix-based disambiguation for mobile text input. Proceedings of the ACM Symposium on User Interface Software and Technology - UIST 2001, 111-120. ⁹