Following the geographical migration in the United States during the pre-war econonomic depression, American comedian Will Rogers1 allegedly said:
When the Okies2 left Oklahoma and moved to California, they raised the average intelligence level in both states.
He was obviously joking, but the effect is possible in principle. Consider for example the following two integer sequences: \[ \begin{eqnarray}A &=& [5, 6, 7, 8, 9] \\ B &=& [1, 2, 3, 4] \end{eqnarray} \] If we move element 5 from sequence $$A$$ to sequence $$B$$, the average of both sequences increases.
This so-called Will Rogers phenomenon3 produces a somewhat paradoxical effect when medical doctors find a better way to detect illness. As a result, relatively healthy people are moved from the "well" category to the "ill" category, and the average health of both populations improves even before treatment takes place.
We represent an integer sequence as a sequence (list or tuple) of integers (int), where you may assume that these sequence is not empty. Your task:
Write a function average that takes a sequence (list or tuple) of integers (int). The function must return the average (float) of the integers in the given sequence.
Write a function move1 that takes three arguments: two lists $$a$$ and $$b$$ (list) of integers (int), and a sequence $$c$$ (list or tuple) of integers (int) that are sampled from list $$a$$. The function must return the value None. After calling the function, all numbers from sequence $$c$$ must be removed from list $$a$$ and added in the given order to the end of list $$b$$. If a number has multiple occurrences in list $$a$$, its first occurrence must be removed.
Write a function move2 that takes three arguments: two sequences $$a$$ and $$b$$ (list or tuple) of integers (int), and a sequence $$c$$ (list or tuple) of integers (int) that are sampled from sequence $$a$$. The function may not change any of its arguments, and must return a tuple (tuple) containing two new lists (list). The first list contains all numbers (int) from sequence $$a$$, in order, but with all numbers from sequence $$c$$ removed. If a number has multiple occurrences in list $$a$$, its first occurrence must be removed. The second list contains all numbers (int) from sequence $$b$$, followed by all numbers (int) from sequence $$c$$, in the given order.
Write a function iswillrogers that takes three arguments: two sequences $$a$$ and $$b$$ (list or tuple) of integers (int), and a sequence $$c$$ (list or tuple) of integers (int) that are sampled from sequence $$a$$. The function may not change any of its arguments, and must return a Boolean value (bool) that indicates if the average of both sequences $$a$$ and $$b$$ would increase if the numbers from sequence $$c$$ were removed from sequence $$a$$ and added to sequence $$b$$.
>>> average((5, 6, 7, 8, 9))
7.0
>>> average([1, 2, 3, 4])
2.5
>>> seq1 = [5, 6, 7, 8, 9]
>>> seq2 = [1, 2, 3, 4]
>>> seq3 = [5]
>>> move1(seq1, seq2, seq3)
>>> seq1
[6, 7, 8, 9]
>>> seq2
[1, 2, 3, 4, 5]
>>> seq3
[5]
>>> seq1 = (5, 6, 7, 8, 9)
>>> seq2 = [1, 2, 3, 4]
>>> seq3 = [5]
>>> move2(seq1, seq2, seq3)
([6, 7, 8, 9], [1, 2, 3, 4, 5])
>>> seq1
(5, 6, 7, 8, 9)
>>> seq2
[1, 2, 3, 4]
>>> seq3
[5]
>>> iswillrogers([5, 6, 7, 8, 9], [1, 2, 3, 4], [5])
True
>>> iswillrogers((5, 6, 7, 8, 9), (1, 2, 3, 4), (7, 9))
False
The Will Rogers phenomenon4 occurs in practice when comparing groups of patients with carcinoma 5— classified into stages according to the TNM system6. For example, Felsenstein et al. compared two groups of patients suffering from lung cancerinoma, respectively diagnosed in 1953–54 and in 1977. While the distribution of patients over the TNM stages I–III was the same in both groups, survival after 6 months was found to be better for all stages in the 1977 group.
In 1977, however, modern methods such as computed tomography, ultrasound and isotope testing had been used extensively for stage classification purposes. If the patients were classified in 1977 without using these modern diagnostic techniques, a significant number would have been classified with a more favorable stage. If survival after 6 months was recalculated using this classification, it would not differ from the patients treated in 1953–54.
Improved survival in 1977 thus appeared not to be a result of improved therapy, but was the result of a more accurate classification into TNM stages using new diagnostic techniques. This study shows the danger of conclusions based on comparisons with historical control groups, even if apparently using the same classification scheme.
Feinstein AR, Sosin DM, Wells CK (1985). The Will Rogers phenomenon. Stage migration and new diagnostic techniques as source of misleading statistics for survival in cancer. The New England Journal of Medicine 312(25), 1604-1608. 7
Sormani MP, Tinorè M, Rovaris M, Rovira A, Vidal X, Bruzzi P, Filippi M, Montalban X (2008). Will Rogers phenomenon in multiple sclerosis. Annals of Neurology 64(4), 428-433. 8