By default, statistical software like R returns many significant digits. The default behavior in R is to show 7 significant digits. That many digits often adds no information and the added visual clutter can make it hard for the viewer to understand the message. As an example, here are the per 10,000 disease rates, computed from totals and population in R, for California across the five decades:
state | year | Measles | Pertussis | Polio |
---|---|---|---|---|
California | 1940 | 37.8826320 | 18.3397861 | 0.8266512 |
California | 1950 | 13.9124205 | 4.7467350 | 1.9742639 |
California | 1960 | 14.1386471 | NA | 0.2640419 |
California | 1970 | 0.9767889 | NA | NA |
California | 1980 | 0.3743467 | 0.0515466 | NA |
We are reporting precision up to 0.00001 cases per 10,000, a very small value in the context of the changes that are occurring across the dates. In this case, two significant figures is more than enough and clearly makes the point that rates are decreasing:
state | year | Measles | Pertussis | Polio |
---|---|---|---|---|
California | 1940 | 37.9 | 18.3 | 0.8 |
California | 1950 | 13.9 | 4.7 | 2.0 |
California | 1960 | 14.1 | NA | 0.3 |
California | 1970 | 1.0 | NA | NA |
California | 1980 | 0.4 | 0.1 | NA |
Useful ways to change the number of significant digits or to round
numbers are signif
and round
. You can define the number of
significant digits globally by setting options like this:
options(digits = 3)
.
Another principle related to displaying tables is to place values being compared on columns rather than rows. Note that our table above is easier to read than this one:
state | disease | 1940 | 1950 | 1960 | 1970 | 1980 |
---|---|---|---|---|---|---|
California | Measles | 37.9 | 13.9 | 14.1 | 1 | 0.4 |
California | Pertussis | 18.3 | 4.7 | NA | NA | 0.1 |
California | Polio | 0.8 | 2.0 | 0.3 | NA | NA |