Prize-winning poet Wendy Cope's beautiful love poem Another Valentine, commissioned by the Daily Telegraph1 in 2009, explores the frequent criticisms levelled at the most romantic day of the year. Yet as the narrator muses on the obligation behind Valentine's Day, romantic feelings are conjured.
Today we are obliged to be romantic
And think of yet another valentine.
We know the rules and we are both pedantic:
Today's the day we have to be romantic.
Our love is old and sure, not new and frantic.
You know I'm yours and I know you are mine.
And saying that has made me feel romantic,
My dearest love, my darling valentine.
The attraction of love is reinforced by a special feature of the poem. Pick any word in the first three lines, count its letters and move ahead the corresponding number of words. For example, if you start at the word obliged on the first line, you'd count 7 letters and move ahead 7 words, landing on the word yet on the second line. Count that word's letters and continue in this manner until you've reached the end of the poem. Whatever word you choose as a starting point in the first three lines, you'll always arrive at the word love on the last line of the poem. You can check this by hoovering the mouse over the words of the poem. This way you can also check that if you start at the word day on the fourth line, you finally arrive at the word romantic on the penultimate line of the poem.
This special property is less surprising than it seems — it's based on a principle called the Kruskal count. It was originally proffered by Rutgers University physicist Martin Kruskal as a card trick, whose magical effect has been known to perplex professional magicians because — as he liked to say — it was based not on sleight of hand but on a mathematical phenomenon. In both the card trick and the love poem various tributaries merge into a common stream that arrives at a predictable destination. We further test the counting trick on the first three verses of Genesis:
In the beginning God created the heaven and the earth.
And the earth was without form, and void;
and darkness was upon the face of the deep.
And the Spirit of God moved upon the face of the waters.
And God said, Let there be light: and there was light.
Pick any word in the first verse, count its letters, and move ahead by the corresponding number of words. For example, if you start at beginning, you'd count 9 letters and move ahead 9 words, landing on the in the second verse. Count that word's letters and continue in this manner until you've entered the third verse. You'll always arrive at God. It also works for this popular lullaby:
Twinkle, twinkle, little star,
How I wonder what you are.
Up above the world so high,
Like a diamond in the sky.
Twinkle, twinkle, little star,
How I wonder what you are.
Choose any word in the first two lines, count its letters, and count forward that number of words. For example, if you choose star, which has four letters, you'd count ahead four words, beginning with how, to reach what. Count the number of letters in that word and count ahead as before. Continue until you can't go any further. You'll always land on you in the last line. The last example was found by Martin Gardner in the opening of Alice's Adventures in Wonderland:
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do; once or twice she had peeped into the book her sister was reading.
Choose any of the first 26 words and tap your way forward in the sentence from that point, tapping one word for each letter. So, for example, if you choose the word Alice, which has five letters, you'd tap was, beginning, to, get, and land on very. Then do the same thing with that word, advancing four letters to land on by. If you keep this up you'll always arrive at the word sister near the end of the fragment.
Your task is to examine whether the counting trick works for the text contained in a given text file. We agree that whenever we split a text into words, the words are made up of the longest possible contiguous sequence of letters (uppercase or lowercase). That means for example that the string Today's contains two words: Today and s. You are asked to:
Write a function split that takes a string (str) as its argument. The function must return a list that contains the sequence of words (str) in the given string.
Write a function sequence that takes a sequence (list or tuple) of words (str). The function also has a second optional parameter start that takes an integer (int; default value: 0). The function must return a tuple with two elements. The first element is the list of words (str) that are visited when Kruskal count is applied on the given sequence of words, starting at the word whose position was passed to the parameter start. The second element is an integer (int) that indicates the maximal number of words that could be appended to the given sequence, without the possibility to jump to the next word when Kruskal count is applied to the word at the given starting position.
The above figure illustrates how the function sequence processes the words on the first line in the love poem. If we start at position 1, the list of words marked in yellow must be returned as the first element of the tuple. From the word obliged we would move ahead another 7 words, which is four words beyond the edge of the sequence. As a result, we could append 3 more words to the sequence without making it possible to jump to another word, which is the value that needs to be returned as the second element of the tuple.
Write a function kruskal that takes the location (str) of a text file. The function also has a second optional parameter start that takes an integer (int; default value: 0). The function also has a third optional parameter last that takes a Boolean value (bool; default value: False). The function must return the list of words (str) visited when Kruskal count is applied to the text contained in the given file, starting at the word whose position was passed to the parameter start. Positions of words are indexed starting from 0. In case the value False was passed to the parameter last, Kruskal count must be applied until the end of the text is reached. In case the value True was passed to the parameter last, Kruskal count must be applied until the first word is reached on the last line of the text. In the latter case, if no word is reached on the last line of the text, the function must raise an AssertionError with the message no word visited on last line.
Write a function trick that takes the location (str) of a text file. The function also has an optional parameter last that has the same meaning as with the function kruskal. The function must return a Boolean value (bool) that indicates whether all words on the first line of the given text file finally end up at the same word if Kruskal count is applied as in the function kruskal. Note that this means that it is possible that the function must raise an AssertionError with the message no word visited on last line.
In the following interactive session we assume that the text files cope.txt2, genesis.txt3, twinkle.txt4 and alice.txt5 are located in the current directory.
>>> split('Today we are obliged to be romantic')
['Today', 'we', 'are', 'obliged', 'to', 'be', 'romantic']
>>> split('And think of yet another valentine.')
['And', 'think', 'of', 'yet', 'another', 'valentine']
>>> split('We know the rules and we are both pedantic:')
['We', 'know', 'the', 'rules', 'and', 'we', 'are', 'both', 'pedantic']
>>> split("Today's the day we have to be romantic.")
['Today', 's', 'the', 'day', 'we', 'have', 'to', 'be', 'romantic']
>>> words = ['Today', 'we', 'are', 'obliged', 'to', 'be', 'romantic']
>>> sequence(words)
(['Today', 'be'], 0)
>>> sequence(words, 1)
(['we', 'obliged'], 3)
>>> sequence(words, start=2)
(['are', 'be'], 0)
>>> kruskal('cope.txt6', 3)
['obliged', 'yet', 'We', 'the', 'we', 'both', 'the', 'have', 'Our', 'old', 'not', 'frantic', 'I', 'know', 'And', 'has', 'feel', 'love']
>>> kruskal('cope.txt7', 21)
['pedantic', 'be', 'Our', 'old', 'not', 'frantic', 'I', 'know', 'And', 'has', 'feel', 'love']
>>> kruskal('cope.txt8', 25)
['day', 'to', 'romantic', 'new', 'You', 'm', 'yours', 'are', 'saying', 'romantic']
>>> kruskal('cope.txt9', 25, last=True)
Traceback (most recent call last):
AssertionError: no word visited on last line
>>> trick('cope.txt10')
True
>>> trick('cope.txt11', last=True)
True
>>> trick('genesis.txt12')
True
>>> trick('genesis.txt13', last=True)
True
>>> trick('twinkle.txt14')
True
>>> trick('twinkle.txt15', last=True)
True
>>> trick('alice.txt16')
True
>>> trick('alice.txt17', last=True)
False