In Robert Chambers' novel The Tracer of Lost Persons1 (1906), Mr. Keen is trying to help Captain Harren find a young woman with whom he has become obsessed. During his search, Mr. Keen copies these strange symbols from a mysterious photograph.
At length he says:
It's the strangest cipher I ever encountered. The strangest I ever heard of. I have seen hundreds of ciphers — hundreds — secret codes of the State Department, secret military codes, elaborate Oriental ciphers, symbols used in commercial transactions, symbols used by criminals and every species of malefactor. And every one of them can be solved with time and patience and a little knowledge of the subject. But this … this is too simple.
The message indeed reveals the name of the young woman whom Captain Harren has been seeking. Who is she?
Her name is Edith Inwood.
The secret message reads as:
INEVERSAWYOUBUTONCEILOVEYOUEDITHINWOOD
or if we properly split it into words and sentences:
I NEVER SAW YOU BUT ONCE. I LOVE YOU. EDITH INWOOD.
In what follows, we explain how to use a box code to decipher the message. You can also find the explanation in Chapter X2 of Chalmers' novel — which, incidentally, inspired a radio program3 that ran for 18 years.
We label eight segments with the lowercase letters a–h and arrange them in the following way:
By selecting one or more segments, we can create a total of 255 different shapes. For example, we could agree that these shapes represent the digits 0–9:
It makes no difference in which order the segments (a–h) of a shape are listed: chaf, hafc and fahc all represent the same shape. Moreover, we could also agree that there are different shapes that represent the same digit. For example, we could shape the digit 9 in the following ways:
A box code is a cipher that has the different ways to shape the ten digits (0–9) documented in a text file. Each line of the file contains a digit, followed by a space and a shape (represented by the sequence of letters (a–h) labeling the segments that form the shape). Each digit (0–9) appears on at least one line, but the same digit may occur on multiple lines. Shapes unambiguously represent a digit, so each shape occurs on at most one line. For example (code.txt4):
0 hde 1 b 1 d 2 fcah 3 gfac 4 bgf 5 eah 6 ghfc 6 chgb 7 afh 7 ab 8 fhcaeg 9 eafh 9 ebfa
To encode a message using this box code, we only take the letters in the message (all other characters are ignored) and make no distinction between uppercase and lowercase letters.
First, each letter is converted to its position in the alphabet (A=1, B=2, ..., Z=26). Then each digit of that position is converted to a corresponding shape. If a digit can be shaped in multiple ways, we randomly choose one of these shapes. For example, the letter H is at position 8 in the alphabet, so it can be encoded as fecgha based on the above box code. If the position consists of two digits, we separate their corresponding shapes with a dash (-). For example, the letter x is at position 24 in the alphabet, so it can be encoded as hfac-bfg (where shape hfac encodes the digit 2 and shape bgf encodes the digit 4). The encoded letters of a message are separated from each other by a single space.
Your task:
Write a function read_box_code that takes the location (str) of a text file documenting the ways in which the ten digits (0–9) can be shaped. The function must return a dictionary (dict) that maps each digit (str) onto a set (set) with all shapes (str) for that digit. The letters of each shape must be listed in alphabetical order. We call this the dictionary representation of the box code.
Write a function letter2code that takes two arguments:
i) a letter
Write a function code2letter that takes two arguments:
i) an encoded letter
Write a function encode that takes two arguments: i)
a message (str) and ii) the dictionary
representation of a box code
Write a function decode that takes two arguments: i)
an encoded message (str) and ii) the dictionary
representation of the box code
All functions may assume that only valid arguments are passed, without the need to check this explicitly.
In the following interactive session we assume the text file code.txt5 to be located in the current directory.
>>> box_code = read_box_code('code.txt')
>>> box_code['0']
{'deh'}
>>> box_code['1']
{'b', 'd'}
>>> box_code['9']
{'abef', 'aefh'}
>>> letter2code('H', box_code)
'fecgha'
>>> letter2code('x', box_code)
'hfac-bfg'
>>> code2letter('fecgha', box_code)
'H'
>>> code2letter('hfac-bfg', box_code)
'X'
>>> encode('And now for something completely different!', box_code)
'b b-bgf fbg d-bfg b-hae afhc-fagc bhgc b-eha b-hfceag d-feha d-hae b-agfc hae afhc-edh ghecfa fahe d-fgb ab gcaf b-aeh d-gfca d-gcbh d-chfa eah cahf-hed eah d-acfh hfac-hea fbg fbea ghcb chgb eha b-cahfeg hae b-bgf hcfa-hed'
>>> encode('And now for something completely different!', box_code)
'd d-bgf bgf d-fgb d-hea ahcf-cfag cgfh d-ahe d-cgfeha b-fbae b-aeh b-fgac ahe afhc-ehd afghec fhea b-bfg ba cagf b-ahe d-cagf b-hgbc b-chfa aeh afhc-hde hae d-chaf cahf-hea fgb ehaf cgbh chgb hae b-aefcgh aeh d-bfg fach-hed'
>>> decode('b b-bgf fbg d-bfg b-hae afhc-fagc bhgc b-eha b-hfceag d-feha d-hae b-agfc hae afhc-edh ghecfa fahe d-fgb ab gcaf b-aeh d-gfca d-gcbh d-chfa eah cahf-hed eah d-acfh hfac-hea fbg fbea ghcb chgb eha b-cahfeg hae b-bgf hcfa-hed', box_code)
'ANDNOWFORSOMETHINGCOMPLETELYDIFFERENT'
>>> decode('d d-bgf bgf d-fgb d-hea ahcf-cfag cgfh d-ahe d-cgfeha b-fbae b-aeh b-fgac ahe afhc-ehd afghec fhea b-bfg ba cagf b-ahe d-cagf b-hgbc b-chfa aeh afhc-hde hae d-chaf cahf-hea fgb ehaf cgbh chgb hae b-aefcgh aeh d-bfg fach-hed', box_code)
'ANDNOWFORSOMETHINGCOMPLETELYDIFFERENT'