When one of the axes is used to show categories, as is done in barplots, the default ggplot2 behavior is to order the categories alphabetically when they are defined by character strings. If they are defined by factors, they are ordered by the factor levels. We rarely want to use alphabetical order. Instead, we should order by a meaningful quantity. In all the cases above, the barplots were ordered by the values being displayed. The exception was the graph showing barplots comparing browsers. In this case, we kept the order the same across the barplots to ease the comparison. Specifically, instead of ordering the browsers separately in the two years, we ordered both years by the average value of 2000 and 2015.
We previously learned how to use the reorder
function, which helps us
achieve this goal. To appreciate how the right order can help convey a
message, suppose we want to create a plot to compare the murder rate
across states. We are particularly interested in the most dangerous and
safest states. Note the difference when we order alphabetically (the
default) versus when we order by the actual rate:
We can make the second plot like this:
data(murders)
murders %>% mutate(murder_rate = total / population * 100000) %>%
mutate(state = reorder(state, murder_rate)) %>%
ggplot(aes(state, murder_rate)) +
geom_bar(stat="identity") +
coord_flip() +
theme(axis.text.y = element_text(size = 6)) +
xlab("")
The reorder
function lets us reorder groups as well. Earlier we saw an
example related to income distributions across regions. Here are the two
versions plotted against each other:
The first orders the regions alphabetically, while the second orders them by the group’s median.