It's Boxing Day1 and the manager of a band promises to give the 10 musicians a huge bonus if they succeed in the following challenge.
The manager has randomly placed the 10 different instruments of the musicians in 10 large concealed boxes, which he has put in a windowless, soundproof practice room. He has also randomly marked the boxes with pictures of the 10 different instruments, but the pictures on the boxes do not necessarily correspond to the instruments inside. One by one, each musician has to enter the practice room and search for his own instrument. While he's in the practice room, he can look inside any five boxes. If he has found his own instrument, he must close the boxes before leaving the practice room using another door. The musicians can't in any way communicate to each other what they find and they only succeed if each one of them has found his own instrument.
Each musician only has a 50% chance of finding his instrument by picking five random boxes. And the chances that all ten will succeed are even lower: just \[\left(\frac{1}{2}\right)^{10} = \frac{1}{1024}\] That's less than 0.1%.
The band however comes up with a strategy that has a better than 35% chance of working. Every musician first opens the box with the picture of his own instrument. If his instrument is inside the box, he's done and succeeds for his part of the challenge. Otherwise he looks at whatever instrument is in the box, and then opens the box with that picture on it.
He keeps going that way until he has either opened five boxes without finding his instrument (which fails the challenge) or finds his instrument (which makes him succeed for his part of the challenge).
This strategy works because each musician follows a linked sequence that starts with the box whose outside matches his instrument and ends with the box actually containing it. Note that if he kept going, that would lead him back to the start, making the linked sequence of boxes a loop. In graph theory2, such a loop is called a cycle3.
Following the linked sequence works much better than random guessing. For an even number of boxes $$m = 2n$$ it has been proven that the chance that all cycles will be of length $$n$$ or less can be computed as \[ 1 - \left( \frac{1}{n+1} + \frac{1}{n+2} + \frac{1}{n+3} + \cdots + \frac{1}{2n} \right)\] Plug in $$m = 10$$ boxes, and we get odds of about 35%. As $$n$$ increases, the odds approach 30%. Not a guarantee, but with a bit of musician's luck, it's far from hopeless they can claim their bonus.
There are $$m$$ different items that are described by a unique string. None of the item descriptions contains a comma. The items are placed inside $$m$$ boxes that have also been marked with pictures of the items, but the pictures on the boxes do not necessarily correspond to the items inside.
The pictures on the boxes and the items inside are described by a text file. Each line of the file describes a box using two item descriptions that are separated by a comma (,). The first description corresponds to the picture on the box and the second description corresponds to the item inside the box. Such a text file may then look like
guitar,conga saxophone,banjo conga,maracas drum set,saxophone trumpet,guitar bass guitar,trombone synthesizer,trumpet banjo,drum set trombone,bass guitar maracas,synthesizer
Here's a graphical representation of the pictures on the boxes and the items inside the boxes, as described by the text file.
Your task:
Write a function read_boxes that takes the location of a text file (str) describing the pictures on a set of boxes and the items inside the boxes. The function must return a dictionary (dict) that maps the pictures on the boxes onto the items inside the boxes. The pictures and the items are described using the same string representations as in the given text file. If the text file does not contain a valid description of boxes, an AssertionError must be raised with the message invalid boxes. This is the case if the same picture appears on multiple boxes, if the same item is inside multiple boxes, or if the pictures on the boxes do not correspond to the items inside the boxes except for their order.
Write a function cycle that takes two arguments: i) the description of an item (str) and ii) a dictionary (dict) as returned by the function read_boxes that maps the pictures on a set of boxes onto the items inside the boxes. If the given item was not placed inside any of the given boxes, an AssertionError must be raised with the message invalid item. Otherwise the function must return a tuple (tuple) describing the cycle (a linked sequence of boxes) that starts with the given item.
Write a function longest_cycle that takes a dictionary (dict) as returned by the function read_boxes that maps the pictures on a set of boxes onto the items inside the boxes. The function must return the length (int) of the longest cycle (a linked sequence of boxes), where each item in the given boxes can be the starting point of a cycle.
In the following interactive session we assume the text files boxes01.txt4, boxes02.txt5, boxes03.txt6 and boxes04.txt7 to be located in the current directory.
>>> boxes = read_boxes('boxes01.txt')
>>> boxes
{'guitar': 'conga', 'saxophone': 'banjo', 'conga': 'maracas', 'drum set': 'saxophone', 'trumpet': 'guitar', 'bass guitar': 'trombone', 'synthesizer': 'trumpet', 'banjo': 'drum set', 'trombone': 'bass guitar', 'maracas': 'synthesizer'}
>>> cycle('guitar', boxes)
('guitar', 'conga', 'maracas', 'synthesizer', 'trumpet')
>>> cycle('saxophone', boxes)
('saxophone', 'banjo', 'drum set')
>>> cycle('bass guitar', boxes)
('bass guitar', 'trombone')
>>> cycle('clarinet', boxes)
Traceback (most recent call last):
AssertionError: invalid item
>>> longest_cycle(boxes)
5
>>> read_boxes('boxes02.txt') # 2x same label
Traceback (most recent call last):
AssertionError: invalid boxes
>>> read_boxes('boxes03.txt') # 2x same content
Traceback (most recent call last):
AssertionError: invalid boxes
>>> read_boxes('boxes04.txt') # labels differ from contents
Traceback (most recent call last):
AssertionError: invalid boxes
Gál A, Miltersen PB (2003). The cell probe complexity of succinct data structures. In International Colloquium on Automata, Languages, and Programming, Springer, Berlin, Heidelberg, 332–344. 8
Gál A, Miltersen PB (2007). The cell probe complexity of succinct data structures. Theoretical Computer Science 379(3), 405–417. 9
Winkler P (2006). Seven puzzles you think you must no have heard correctly. Seventh Gathering for Gardner (in honour of Martin Gardner).10