In 1951 James Thurber1's friend Joseph Mitchell2 challenged him to think of an English word that contains the four consecutive letters SGRA. Lying in bed that night, Thurber came up with these:

Unfortunately, none of these is in the dictionary. What word was Mitchell thinking of?

Disgrace.

This may well be the most common word containing the consecutive letters SGRA, in most English dictionaries it is definitely not the only one. Consider for example the words palsgrave3 (a high noble title), grosgrain4 (a type of fabric characterized by its ribbed appearance), dysgraphia5 (a deficiency in the ability to write) or sgraffito6 (a technique either of wall decor or in pottery). However, these are all words of foreign origin.

Assignment

An $$n$$-gram is a string (str) with $$n$$ letters. Only the 26 letters of the alphabet (az) may appear in an $$n$$-gram, so spaces, punctuation marks or letters with diacritics7 are not allowed. Your task:

The functions ngram_count and unique_ngram may never change the given dictionary. The text files passed to the function dictionary use the UTF-88 character encoding. The built-in function open has a parameter encoding that can be used to specify the character encoding for the file:

>>> open('file.txt', 'r', encoding='utf-8')

Epilogue

From a 1921 essay by A.A. Milne9:

TERALBAY is not a word which one uses much in ordinary life. Rearrange the letters, however, and it becomes such a word. A friend — no, I can call him a friend no longer — a person gave me this collection of letters as I was going to bed and challenged me to make a proper word of it. He added that Lord Melbourne — this, he alleged, is a well-known historical fact — Lord Melbourne had given this word to Queen Victoria once, and it had kept her awake the whole night. After this, one could not be so disloyal as to solve it at once. For two hours or so, therefore, I merely toyed with it. Whenever I seemed to be getting warm I hurriedly thought of something else. This quixotic loyalty has been the undoing of me; my chances of a solution have slipped by, and I am beginning to fear that they will never return. While this is the case, the only word I can write about is TERALBAY.

The answer is not RATEABLY, or BAT-EARLY, which "ought to mean something, but it doesn't". Rudolf Flesch10 notes that TRAYABLE is not a word, and that, though TEARABLY appears in small type in Webster's Unabridged, "it obviously won't do".

What's the answer? There's no trick — it's an ordinary English word.

BETRAYAL

Example

In the following interactive session we assume the text files dictionary.en.txt11 and dictionary.nl.txt12 to be located in the current directory.

>>> ngrams('Disgrace', 4)
['Disg', 'isgr', 'sgra', 'grac', 'race']
>>> ngrams('four-year-old', 3)
['fou', 'our', 'yea', 'ear', 'old']
>>> ngrams('façade', 2)
['fa', 'ad', 'de']
>>> ngrams('semi-self-sustaining', 4)
['semi', 'self', 'sust', 'usta', 'stai', 'tain', 'aini', 'inin', 'ning']

>>> english = dictionary('dictionary.en.txt13', 4)
>>> english['SGRA']
{'disgrading', 'disgrade', 'disgracers', 'misgrafts', 'palsgrave', 'hansgrave', 'disgracement', 'misgraded', 'misgrading', 'disgracer', 'disgraced', 'grosgrain', 'misgracious', 'disgracive', 'disgrace', 'predisgrace', 'palsgraf', 'disgracing', 'dysgraphia', 'misgrafting', 'disgraded', 'disgracefulness', 'misgraft', 'misgrave', 'misgraffed', 'misgrade', 'grosgrained', 'disgraceful', 'disgracia', 'disgracefully', 'disgracious', 'grosgrains', 'disgradation', 'disgraces', 'sgraffiti', 'crossgrainedness', 'misgraff', 'sgraffiato', 'undisgraced', 'disgradulate', 'misgrafted', 'sgraffito', 'palsgravine'}
>>> ngram_count(english)
63546
>>> ngram_count(english, 1)
13711
>>> ngram_count(english, 2)
7408
>>> ngram_count(english, 3)
4613
>>> unique_ngram(english)
('RLBU', 'pearlbush')
>>> unique_ngram(english)
('FIFR', 'kefifrel')
>>> unique_ngram(english)
('PPAT', 'wappato')

>>> english = dictionary('dictionary.en.txt14', 3)
>>> english['GNT']
{'sovereignties', 'pgntt', 'supersovereignty', 'pgnttrp', 'sovereignty', 'cosovereignty', 'semisovereignty'}
>>> ngram_count(english)
9025
>>> ngram_count(english, 1)
1125
>>> ngram_count(english, 2)
561
>>> ngram_count(english, 3)
332
>>> unique_ngram(english)
('HMP', 'rhythmproof')
>>> unique_ngram(english)
('IPZ', 'leipzig')
>>> unique_ngram(english)
('LZH', 'alzheimer')

>>> dutch = dictionary('dictionary.nl.txt15', 4)
>>> dutch['SGRA']
{'visgraatje', 'bezettingsgraad', 'hardheidsgraad', 'asgrauwe', 'jongensgrappen', 'verzuringsgraad', 'doctorsgraden', 'alfabetiseringsgraad', 'werkgelegenheidsgraad', 'schansgravers', 'orpheusgrasmussen', 'visgraatmotief', 'moeilijkheidsgraad', 'werkloosheidsgraad', 'stadsgrachten', 'zeemansgraf', 'tewerkstellingsgraad', 'basisgrammatica', 'varkensgras', 'werkloosheidsgraden', 'werkingsgraad', 'leesgrage', 'scholingsgraad', 'ontwikkelingsgraad', 'vullingsgraad', 'glansgraad', 'Bosgra', 'koersgrafieken', 'vervuilingsgraad', 'activiteitsgraad', 'traangasgranaten', 'visgraatpatronen', 'gifgasgranaten', 'paltsgraven', 'visgraatdessins', 'kostendekkingsgraden', 'bewustzijnsgraad', 'visgraat', 'beladingsgraad', 'oorlogsgraf', 'zelfvoorzieningsgraad', 'oorlogsgraven', 'hellingsgraad', 'paltsgraaf', 'liesgras', 'koningsgraf', 'dekkingsgraden', 'moeilijkheidsgraden', 'visgraatdiagram', 'gasgranaten', 'vrijheidsgraad', 'beschavingsgraad', 'vrijheidsgraden', 'gasgranaat', 'doctorsgraad', 'asgrauw', 'besmettingsgraad', 'dekkingsgraad', 'luchtvochtigheidsgraad', 'beursgraadmeters', 'automatiseringsgraad', 'bezettingsgraden', 'leesgraag', 'Palsgraaf', 'koersgrafiek', 'stadsgracht', 'bedekkingsgraad', 'opleidingsgraad', 'olifantsgras', 'vochtigheidsgraad', 'beschermingsgraad', 'molenaarsgraaf', 'benuttingsgraad', 'zeemansgraven', 'hardheidsgraden', 'paltsgravin', 'bewolkingsgraad', 'struisgras', 'beursgraadmeter', 'koningsgraven', 'kostendekkingsgraad', 'visgraten', 'uitbuitingsgraad', 'paltsgraafschap', 'visgraatjes', 'jongensgrap', 'traangasgranaat', 'werkzaamheidsgraad', 'zuiverheidsgraad'}
>>> ngram_count(dutch)
57210
>>> ngram_count(dutch, 1)
8224
>>> ngram_count(dutch, 2)
7360
>>> ngram_count(dutch, 3)
3856
>>> unique_ngram(dutch)
('LAMR', 'glamrock')
>>> unique_ngram(dutch)
('ACGL', 'cognacglazen')
>>> unique_ngram(dutch)
('ALOL', 'afvalolie')