Benford's law describes the expected distribution of first digits of numbers in a variety of large data sets, such as street addresses, population numbers and lengths of rivers. The law was first discovered in 1881 by the American mathematician and astronomer Simon Newcomb, but became more widely known by the rediscovery and publications in 1938 by Frank Benford, a physicist who worked at the American company General Electric for his entire life.

Benford's law
Distribution of first digits of numbers according to Benford's law.

In 1938 Benford published an article in which he described the observation that in many collections of numbers in daily life (but not all of them) most numbers start with a 1. Less numbers start with a 2 and the least of all numbers start with a 9. This indicates that the probability to be the first digit of a number is different for the digits 1 to 9. Benford showed that the probability of a number starting with a 1, in a large collection of numbers, is about 30%. The probability that a number starts with a 9 however, is only about 5%. This law became known as Benford's law.

Mathematically, this law is expressed through the following probability function. The probability of the first digit $${\displaystyle D_{1}}$$ of a number being equal to $${\displaystyle d}$$ is given by: \[ {\displaystyle P(D_{1}=d)=\log _{10}\left(1+{\frac {1}{d}}\right)} \texttt{ for } {\displaystyle d=1,\ldots, 9} \] This probability distribution is displayed in the table below:

first digit 1 2 3 4 5 6 7 8 9
probability (%) 30.1 17.6 12.5 9.7 7.9 6.7 5.8 5.1 4.6

Benford investigated enormous amounts of numerical data, like the lengths of rivers, de surfaces of lakes and farmlands, heights of mountains, numerical phenomena from physics and chemistry, mathematical tables, numbers from magazines and newspapers, etc. As such he kept finding more empirical evidence for the correctness of his formula. An explanation for its correctness however, he was unable to provide.

Assignment

Complete the following Unix command so that it writes a frequency table of the first digits, of the natural numbers found in the third column of the given text file, to standard output (stdout).

$ cat <filename> |

In the given text file, columns are separated by comma's (,). The data values themselves do not contain comma signs. The third column always contains a strictly positive natural number. In the output, the first digits of these numbers need to be ordered by decreasing occurrence. Digits with equal occurrences need to be sorted in increasing order.

Examples

$ cat data01.txt1 |   2154 1
   1183 2
    912 3
    724 4
    573 5
    485 6
    400 7
    359 8
    342 9
$ cat data02.txt2 |   5434 1
   1242 7
     94 3
     82 2
     44 4
     20 8
     16 5
     11 6
      7 9

Submission guidelines

Only submit the missing part of the command, that needs to be filled in at the three dots (…).