Prize-winning poet Wendy Cope's beautiful love poem Another Valentine, commissioned by the Daily Telegraph1 in 2009, explores the frequent criticisms levelled at the most romantic day of the year. Yet as the narrator muses on the obligation behind Valentine's Day, romantic feelings are conjured.

Today we are obliged to be romantic
And think of yet another valentine.
We know the rules and we are both pedantic:
Today's the day we have to be romantic.

Our love is old and sure, not new and frantic.
You know I'm yours and I know you are mine.
And saying that has made me feel romantic,
My dearest love, my darling valentine.

The attraction of love is reinforced by a special feature of the poem. Pick any word in the first three lines, count its letters and move ahead the corresponding number of words. For example, if you start at the word obliged on the first line, you'd count 7 letters and move ahead 7 words, landing on the word yet on the second line. Count that word's letters and continue in this manner until you've reached the end of the poem. Whatever word you choose as a starting point in the first three lines, you'll always arrive at the word love on the last line of the poem. You can check this by hoovering the mouse over the words of the poem. This way you can also check that if you start at the word day on the fourth line, you finally arrive at the word romantic on the penultimate line of the poem.

This special property is less surprising than it seems — it's based on a principle called the Kruskal count. It was originally proffered by Rutgers University physicist Martin Kruskal as a card trick, whose magical effect has been known to perplex professional magicians because — as he liked to say — it was based not on sleight of hand but on a mathematical phenomenon. In both the card trick and the love poem various tributaries merge into a common stream that arrives at a predictable destination. We further test the counting trick on the first three verses of Genesis:

In the beginning God created the heaven and the earth.

And the earth was without form, and void;
and darkness was upon the face of the deep.
And the Spirit of God moved upon the face of the waters.

And God said, Let there be light: and there was light.

Pick any word in the first verse, count its letters, and move ahead by the corresponding number of words. For example, if you start at beginning, you'd count 9 letters and move ahead 9 words, landing on the in the second verse. Count that word's letters and continue in this manner until you've entered the third verse. You'll always arrive at God. It also works for this popular lullaby:

Twinkle, twinkle, little star,
How I wonder what you are.
Up above the world so high,
Like a diamond in the sky.
Twinkle, twinkle, little star,
How I wonder what you are.

Choose any word in the first two lines, count its letters, and count forward that number of words. For example, if you choose star, which has four letters, you'd count ahead four words, beginning with how, to reach what. Count the number of letters in that word and count ahead as before. Continue until you can't go any further. You'll always land on you in the last line. The last example was found by Martin Gardner in the opening of Alice's Adventures in Wonderland:

Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do; once or twice she had peeped into the book her sister was reading.

Choose any of the first 26 words and tap your way forward in the sentence from that point, tapping one word for each letter. So, for example, if you choose the word Alice, which has five letters, you'd tap was, beginning, to, get, and land on very. Then do the same thing with that word, advancing four letters to land on by. If you keep this up you'll always arrive at the word sister near the end of the fragment.

Assignment

Your task is to examine whether the counting trick works for the text contained in a given text file. We agree that whenever we split a text into words, the words are made up of the longest possible contiguous sequence of letters (uppercase or lowercase). That means for example that the string Today's contains two words: Today and s. You are asked to:

Example

In the following interactive session we assume that the text files cope.txt2, genesis.txt3, twinkle.txt4 and alice.txt5 are located in the current directory.

>>> split('Today we are obliged to be romantic')
['Today', 'we', 'are', 'obliged', 'to', 'be', 'romantic']
>>> split('And think of yet another valentine.')
['And', 'think', 'of', 'yet', 'another', 'valentine']
>>> split('We know the rules and we are both pedantic:')
['We', 'know', 'the', 'rules', 'and', 'we', 'are', 'both', 'pedantic']
>>> split("Today's the day we have to be romantic.")
['Today', 's', 'the', 'day', 'we', 'have', 'to', 'be', 'romantic']

>>> words = ['Today', 'we', 'are', 'obliged', 'to', 'be', 'romantic']
>>> sequence(words)
(['Today', 'be'], 0)
>>> sequence(words, 1)
(['we', 'obliged'], 3)
>>> sequence(words, start=2)
(['are', 'be'], 0)

>>> kruskal('cope.txt6', 3)
['obliged', 'yet', 'We', 'the', 'we', 'both', 'the', 'have', 'Our', 'old', 'not', 'frantic', 'I', 'know', 'And', 'has', 'feel', 'love']
>>> kruskal('cope.txt7', 21)
['pedantic', 'be', 'Our', 'old', 'not', 'frantic', 'I', 'know', 'And', 'has', 'feel', 'love']
>>> kruskal('cope.txt8', 25)
['day', 'to', 'romantic', 'new', 'You', 'm', 'yours', 'are', 'saying', 'romantic']
>>> kruskal('cope.txt9', 25, last=True)
Traceback (most recent call last):
AssertionError: no word visited on last line

>>> trick('cope.txt10')
True
>>> trick('cope.txt11', last=True)
True

>>> trick('genesis.txt12')
True
>>> trick('genesis.txt13', last=True)
True

>>> trick('twinkle.txt14')
True
>>> trick('twinkle.txt15', last=True)
True

>>> trick('alice.txt16')
True
>>> trick('alice.txt17', last=True)
False

Resources