During the Vietnam War "clandestine" communication between US military forces was based on the Diana Cryptosystem. This methode to encode en decode messages is theoretically unbreakable when used properly. The cryptographic cipher is based on two techniques.

First of all, it uses a trigraph to convert two letters into a third letter. This conversion uses a fixed lookup table — an example of which is pictured below — that maps each combination of two black letters into a third red letter. What is specific about this conversion is that for each triplet of letters it holds that any two letters of the triplet are always converted into the third letter of the triplet. For example, if it is given that the letters B and K are converted in the letter O, then we also know that the letters K and B yield the letter O, and also that the letters O and B yield the letter K, and that the letters K and B yield the letter O. Actually, it is quite easy to compute the conversion. If we find the two given letters in the alphabet at positions $$p_1$$ and $$p_2$$ (where A is at position 0, B is at position 1, and so on), then we find the third letter in the alphabet at position $$(25 - p_1 - p_2)\ \textrm{mod}\ 26$$, where the operator $$\textrm{mod}$$ represents the remainder after integer division.

trigraph

In addition, the Diana Cryptosystem makes use of a so-called one-time pad. This is nothing more than a randomly generated list of letters. To improve readability the letters are usually displayed in groups of five letters, but in general each character in the one-time pad that is not a letter may be ignored (also including the spaces used for formatting the groups). As an example we consider the following one-time pad.

WHTVI AUCFU RETFK OMSAL
MYMNE ZIEGP UKVTF WZHOK
GORWY WETFR COYET OOWHY
ZPDDA CMMXT VYTJI RRQGU
VAXPM IPIXU QUXIP MAXIU

To encode the message ATTACK AT DAWN, we proceed as follows. First, a random $$n$$-letter fragment is selected from the one-time pad, where $$n \in \mathbb{N}_0$$ is fixed as part of the cryptosystem. Say, for example, that we have chosen the ten letters UKVTF WZHOK. The letters of the message (all characters that are not letters must be ignored) are then written under the letters of the one-time pad that follow the randomly selected fragment. Thereafter each pair of letters (a letters from the one-time pad and a letter from the original message at the same position) are converted into a third letters using the trigraph lookup table. The resulting letters form the encrypted message.

     one-time pad: UKVTF WZHOK GORWY WETFR COYET OOWHY
origineel bericht:             ATTAC KATDA WN
                   -----------------------------------
gecodeerd bericht:             TSPDZ TVNRI BY

The text UKVTF WZHOK TSPDZ TVNRI BY is transmitted by morse code: the letters of the randomly selected fragment are transmitted unencrypted and the letters of the message are transmitted encrypted. Because of the symmetric nature of the trigraph, decoding a message follows the exact same procedure as encoding a message. First the recipient looks up the first ten letters of the message in the one-time pad (the recipient must use the same one-time pad as the sender), and decodes the remaining letters by again making use the trigraph lookup table.

     one-time pad: UKVTF WZHOK GORWY WETFR COYET OOWHY
gecodeerd bericht:             TSPDZ TVNRI BY            
                   -----------------------------------
origineel bericht:             ATTAC KATDA WN

A military of the US forces that was fighting in the Vietnam War explains one use case of the Diana Cryptosystem in the following way:

Special Forces were one of (if not the only) units in Vietnam to utilize Morse code on a regular basis. We used a method of encryption called the Diana Cryptosystem.

The basis of these
one-time pads, is that there were only two matching pads in existence, and they would only be used one time. They were booklets that contained randomly generated groups of 5-letter words, 30 words to a page. The person sending a message would first write the letters to the message, over these random groups of words. Included in the front of each one-time pad was a one-page encryption table. If I wanted to send the letter P, and the letter under the P was an A, then I would send a K. The person listening on the frequency at the other end, would have the other matching pad. They would write the letter they received (a K) over the letter in their one-time pad (an A), and decipher it based on the table, yielding the original letter P.

Each communication site in Vietnam (we had over 100 A-Camps along the Cambodian / Laotian border, and some 20 B-detachment sites spread over the country) had a different pad, depending on the location they were having the commo-check with. It obviously was very important that both people were using the appropriate matching pads, or the deciphered messages would not make any sense.

After a while, most of us became so proficient with the system, that we actually learned the deciphering matrix by heart. No matter what pads anyone had, the combinations always were the same. i.e. Any 3 letters always went together, regardless of the order; BKO/KOB/OBK/BOK. After listening to thousands and thousands of transmissions, it really got quite simple. If I was listening to code, and a letter B was sent (now remember, we usually sent around 20-25
words (5 letters per word) a minute, hence the importance of the speed keys!), and the letter it was associated with was an O, most of us would decipher as we heard it, and just write the K. That may sound like quite a yarn, but it is absolutely true.

The combination of a trigraph lookup table and a one-time pad may seem like a neat parlor trick, but the resulting cryptographic cipher is actually quite strong. In fact, assuming that the pads are truly randomly generated, never reused and never compromised the system is unbreakable. It therefore comes as no surprise that was and still is used by many intelligence agencies around the world. The KGB often issued its agents one-time pads printed on tiny sheets of flash paper — paper chemically converted to nitrocellulose, which burns almost instantly and leaves no ash.

Assignment

Define a class Diana that can be used to encode and decode messages according to the Diana Cryptosystem using a given one-time pad. This class must support at least the following methods:

Example

In the following interactive session we assume that the text file otp.txt1 is located in the current directory.

>>> diana = Diana('otp.txt')

>>> diana.index('UKVTF WZHOK')
40
>>> diana.index('CMMXT VYTJI RRQGU')
80
>>> diana.index('ABCDE FGHIJ')
Traceback (most recent call last):
AssertionError: invalid prefix

>>> diana.trigraph('Q', 'K')
'Z'
>>> diana.trigraph('t', 'f')
'B'

>>> diana.encode('UKVTF WZHOK attack at dawn')
'TSPDZTVNRIBY'
>>> diana.encode('CMMXT VYTJI RRQGU meet at ten tonight', 15)
'SVYRNYRNPMVSULDU'
>>> diana.encode('ABCDE FGHIJ zero dark thirty')
Traceback (most recent call last):
AssertionError: invalid prefix
>>> diana.encode('VAXPM IPIXU zero dark thirty')
Traceback (most recent call last):
AssertionError: one-time pad is too short

>>> diana.decode('UKVTF WZHOK TSPDZ TVNRI BY')
'ATTACKATDAWN'
>>> diana.decode('CMMXT VYTJI RRQGU SVYRN YRNPM VSULD U', 15)
'MEETATTENTONIGHT'
>>> diana.decode('ABCDE FGHIJ zero dark thirty')
Traceback (most recent call last):
AssertionError: invalid prefix
>>> diana.decode('VAXPM IPIXU zero dark thirty')
Traceback (most recent call last):
AssertionError: one-time pad is too short