The tidyverse functions know how to interpret grouped tibbles.
Furthermore, to facilitate stringing commands through the pipe %>%
,
tidyverse functions consistently return data frames, since this assures
that the output of a function is accepted as the input of another. But
most R functions do not recognize grouped tibbles nor do they return
data frames. The quantile
function is an example we described in
Section 4.71. The do
function serves as a
bridge between R functions such as quantile
and the tidyverse. The
do
function understands grouped tibbles and always returns a data
frame.
In Section 4.72, we noted that if we attempt
to use quantile
to obtain the min, median and max in one call, we will
receive an error: Error: expecting result of length one, got : 2
.
data(heights)
heights %>%
filter(sex == "Female") %>%
summarize(range = quantile(height, c(0, 0.5, 1)))
We can use the do
function to fix this.
First we have to write a function that fits into the tidyverse approach: that is, it receives a data frame and returns a data frame.
my_summary <- function(dat){
x <- quantile(dat$height, c(0, 0.5, 1))
tibble(min = x[1], median = x[2], max = x[3])
}
We can now apply the function to the heights dataset to obtain the summaries:
heights %>%
group_by(sex) %>%
my_summary
#> # A tibble: 1 x 3
#> min median max
#> <dbl> <dbl> <dbl>
#> 1 50 68.5 82.7
But this is not what we want. We want a summary for each sex and the
code returned just one summary. This is because my_summary
is not part
of the tidyverse and does not know how to handled grouped tibbles. do
makes this connection:
heights %>%
group_by(sex) %>%
do(my_summary(.))
#> # A tibble: 2 x 4
#> # Groups: sex [2]
#> sex min median max
#> <fct> <dbl> <dbl> <dbl>
#> 1 Female 51 65.0 79
#> 2 Male 50 69 82.7
Note that here we need to use the dot operator. The tibble created by
group_by
is piped to do
. Within the call to do
, the name of this
tibble is .
and we want to send it to my_summary
. If you do not use
the dot, then my_summary
has no argument and returns an error
telling us that argument "dat"
is missing. You can see the error by
typing:
heights %>%
group_by(sex) %>%
do(my_summary())
If you do not use the parenthesis, then the function is not executed and
instead do
tries to return the function. This gives an error because
do
must always return a data frame. You can see the error by typing:
heights %>%
group_by(sex) %>%
do(my_summary)