The tidyverse functions know how to interpret grouped tibbles. Furthermore, to facilitate stringing commands through the pipe %>%, tidyverse functions consistently return data frames, since this assures that the output of a function is accepted as the input of another. But most R functions do not recognize grouped tibbles nor do they return data frames. The quantile function is an example we described in Section 4.71. The do function serves as a bridge between R functions such as quantile and the tidyverse. The do function understands grouped tibbles and always returns a data frame.

In Section 4.72, we noted that if we attempt to use quantile to obtain the min, median and max in one call, we will receive an error: Error: expecting result of length one, got : 2.

data(heights)
heights %>% 
  filter(sex == "Female") %>%
  summarize(range = quantile(height, c(0, 0.5, 1)))

We can use the do function to fix this.

First we have to write a function that fits into the tidyverse approach: that is, it receives a data frame and returns a data frame.

my_summary <- function(dat){
  x <- quantile(dat$height, c(0, 0.5, 1))
  tibble(min = x[1], median = x[2], max = x[3])
}

We can now apply the function to the heights dataset to obtain the summaries:

heights %>% 
  group_by(sex) %>% 
  my_summary
#> # A tibble: 1 x 3
#>     min median   max
#>   <dbl>  <dbl> <dbl>
#> 1    50   68.5  82.7

But this is not what we want. We want a summary for each sex and the code returned just one summary. This is because my_summary is not part of the tidyverse and does not know how to handled grouped tibbles. do makes this connection:

heights %>% 
  group_by(sex) %>% 
  do(my_summary(.))
#> # A tibble: 2 x 4
#> # Groups:   sex [2]
#>   sex      min median   max
#>   <fct>  <dbl>  <dbl> <dbl>
#> 1 Female    51   65.0  79  
#> 2 Male      50   69    82.7

Note that here we need to use the dot operator. The tibble created by group_by is piped to do. Within the call to do, the name of this tibble is . and we want to send it to my_summary. If you do not use the dot, then my_summary has no argument and returns an error telling us that argument "dat" is missing. You can see the error by typing:

heights %>% 
  group_by(sex) %>% 
  do(my_summary())

If you do not use the parenthesis, then the function is not executed and instead do tries to return the function. This gives an error because do must always return a data frame. You can see the error by typing:

heights %>% 
  group_by(sex) %>% 
  do(my_summary)