The keypad of a mobile phone consists of 12 keys: numbered keys 09 and two additional keys (* and #). To enter text on such a keypad, the letters AZ are spread over keys 29 in alphabetic order. The SPACE character is often assigned to the 0-key, but this varies depending on the phone. As the alphabet has 26 letters, three or four letters are grouped on each key, and, so, ambiguity arises.

keypad
The 12-key keypad of a mobile phone as used by LetterWise has a special NEXT-key at the bottom right of the keypad.

LetterWise1 is a predictive text entry system that was developed to overcome this ambiguity. Unlike other predictive text entry systems such as Multitap2 or T93, LetterWise does not depend on a built-in dictionary, allowing the user to type anything, yet with very high efficiency. Being a very simple system to use, its instruction manual is only one sentence:

Hit the key with the letter you want and if it doesn't come up, hit Next until it does.

LetterWise works with a stored database of probabilities of prefixes to guess the letter that you want to enter. A prefix is the letters preceding the current keystroke. For example, if the user presses 3 with the prefix th, the most likely next letter is e because the in English is far more probable than either thd or thf. LetterWise occasionally guesses the wrong letter, and in these cases the user must press a special NEXT-key to choose the next mostly likely letter for the given key and prefix.

LetterWise uses a separate database for each language (English, Dutch, …), which maps prefixes of zero, one, two and three letters onto an alphabet. Such an alphabet contains all 26 uppercase letters according to decreasing probability of occurrence after the corresponding prefix. For example, in the table below you see some mappings from the English database.

prefix alphabet
  EIASNORTLCUPDMHGYBFVKWZXQJ
W AIEOHNRSLDBKTYMFUPWCGZJQVX
WA RYTLNSIGBKDVMXUFCPHWEZAJOQ
WAT ECTHASUFINORLPDMGYBVKWZXQJ
ATE DSRLNCGMEAFUWTBIOHPYXKVZQJ

Only the most common prefixes are included in the database, as a way to save memory. However, it is guaranteed that the database always maps the empty prefix to an alphabet (first row in the above table). If a prefix does not appear in the database, by definition it is linked to the alphabet corresponding to the empty prefix.

letterwise
To enter the letter R after the letters WATE have been entered, LetterWise requires you to enter the 7-key followed by the NEXT-key.

For example, say we want to enter the word WATER and we have already entered the prefix WATE. LetterWise uses at most three letters at the end of the prefix (ATE) to lookup the corresponding alphabet in the database (DSRLNCGMEAFUWTBIOHPYXKVZQJ). To enter the letter R, we tap the 7-key that is marked with the group of letters PQRS. From the alphabet that corresponds to the prefix ATE, LetterWise guesses that we intend to enter the letter S because out of the four letters on the 7-key it is the one that appears first in the alphabet. Therefore, we also have to tap the NEXT-key in this case to select the next letter on the 7-key from the alphabet. That turns out to be the letter R.

Entering a letter with LetterWise therefore comes down to tapping a digit-key, followed by zero or more taps on the NEXT-key. Such a combination of keys that must be tapped to enter a single letter is represented as a string (str) that starts with a digit (corresponding to the digit-key containing a group of letters), followed by zero or more repetitions of the letter N (NEXT-key). As a result, these are the combinations that must be tapped to enter the word WATER when the English version of the LetterWise database is used:

prefix combination alphabet
  9N EIASNORTLCUPDMHGYBFVKWZXQJ
W 2 AIEOHNRSLDBKTYMFUPWCGZJQVX
WA 8 RYTLNSIGBKDVMXUFCPHWEZAJOQ
WAT 3 ECTHASUFINORLPDMGYBVKWZXQJ
WATE 7N DSRLNCGMEAFUWTBIOHPYXKVZQJ

Finally, we note that when using the NEXT-key to run through the letters grouped on a digit-key, the last letter of the group is again followed by the first letter of the group. For example, for prefix WATE we get the letter S with combination 7, the letter R with combination 7N, the letter P with combination 7NN, the letter Q with combination 7NNN, and again the letter S with combination 7NNNN.

Assignment

The LetterWise database for a particular language is stored in a text file, with each line containing a prefix, followed by a comma (,) and the alphabet that corresponds to the prefix. The example below shows a selection of the lines from the English version of the LetterWise database (en.txt4).

,EIASNORTLCUPDMHGYBFVKWZXQJ
A,TLNRCSBMPDGIUVEKYFWZHXOQJA
AA,LRMSNTDGPIHBFKCEAUWZOYVXQJ
AAB,ADEISNORTLCUPMHGYBFVKWZXQJ
AAD,EIZASNORTLCUPDMHGYBFVKWXQJ
…
ATD,ORAEISNTLCUPDMHGYBFVKWZXQJ
ATE,DSRLNCGMEAFUWTBIOHPYXKVZQJ
ATF,OIUAELSNRTCPDMHGYBFVKWZXQJ
…
VZO,NKEIASORTLCUPDMHGYBFVWZXQJ
W,AIEOHNRSLDBKTYMFUPWCGZJQVX
WA,RYTLNSIGBKDVMXUFCPHWEZAJOQ
WAA,RCPGEIASNOTLUDMHYBFVKWZXQJ
WAB,LBISEARTNOCUPDMHGYFVKWZXQJ
…
WAS,HTPSENOAKIRLCUDMGYBFVWZXQJ
WAT,ECTHASUFINORLPDMGYBVKWZXQJ
WAU,LKCGRNBFSPMVEIAOTUDHYWZXQJ

Your task:

Example

In the following interactive session we assume the text files en.txt5 and nl.txt6 to be located in the current directory.

>>> database = read_database('en.txt7')
>>> database['']
'EIASNORTLCUPDMHGYBFVKWZXQJ'
>>> database['W']
'AIEOHNRSLDBKTYMFUPWCGZJQVX'
>>> database['WA']
'RYTLNSIGBKDVMXUFCPHWEZAJOQ'
>>> database['WAT']
'ECTHASUFINORLPDMGYBVKWZXQJ'
>>> database['ATE']
'DSRLNCGMEAFUWTBIOHPYXKVZQJ'
>>> database['AAA']
Traceback (most recent call last):
KeyError: 'AAA'

>>> letter('', '9', database)
'Y'
>>> letter('', '9N', database)
'W'
>>> letter('W', '2', database)
'A'
>>> letter('WA', '8', database)
'T'
>>> letter('WAT', '3', database)
'E'
>>> letter('WATE', '7', database)
'S'
>>> letter('WATE', '7N', database)
'R'
>>> letter('WATE', '7NN', database)
'P'
>>> letter('WATE', '7NNN', database)
'Q'
>>> letter('WATE', '7NNNN', database)
'S'

>>> combinations('9N2837N')
['9N', '2', '8', '3', '7N']
>>> combinations('4N25837N83553')
['4N', '2', '5', '8', '3', '7N', '8', '3', '5', '5', '3']
>>> combinations('4N65N524N36')
['4N', '6', '5N', '5', '2', '4N', '3', '6']
>>> combinations('92837')
['9', '2', '8', '3', '7']
>>> combinations('4NN25837N83553')
['4NN', '2', '5', '8', '3', '7N', '8', '3', '5', '5', '3']
>>> combinations('4NN65N524N3N6')
['4NN', '6', '5N', '5', '2', '4N', '3N', '6']

>>> word('9N2837N', database)
'WATER'
>>> word('4N25837N83553', database)
'HALTESTELLE'
>>> word('4N65N524N36', database)
'HOKKAIDO'

>>> database = read_database('nl.txt8')
>>> word('92837', database)
'WATER'
>>> word('4NN25837N83553', database)
'HALTESTELLE'
>>> word('4NN65N524N3N6', database)
'HOKKAIDO'

Resources