With dplyr we can perform a series of operations, for example
select
and then filter
, by sending the results of one function to
another using what is called the pipe operator: %>%
. Some details
are included below.
We wrote code above to show three variables (state, region, rate) for
states that have murder rates below 0.71. To do this, we defined the
intermediate object new_table
. In dplyr we can write code that
looks more like a description of what we want to do without intermediate
objects:
For such an operation, we can use the pipe %>%
. The code looks like
this:
murders %>% select(state, region, rate) %>% filter(rate <= 0.71)
#> state region rate
#> 1 Hawaii West 0.515
#> 2 Iowa North Central 0.689
#> 3 New Hampshire Northeast 0.380
#> 4 North Dakota North Central 0.595
#> 5 Vermont Northeast 0.320
This line of code is equivalent to the two lines of code above. What is going on here?
In general, the pipe sends the result of the left side of the pipe to be the first argument of the function on the right side of the pipe. Here is a very simple example:
16 %>% sqrt()
#> [1] 4
We can continue to pipe values along:
16 %>% sqrt() %>% log2()
#> [1] 2
The above statement is equivalent to log2(sqrt(16))
.
Remember that the pipe sends values to the first argument, so we can define other arguments as if the first argument is already defined:
16 %>% sqrt() %>% log(base = 2)
#> [1] 2
Therefore, when using the pipe with data frames and dplyr, we no longer need to specify the required first argument since the dplyr functions we have described all take the data as the first argument. In the code we wrote:
murders %>% select(state, region, rate) %>% filter(rate <= 0.71)
murders
is the first argument of the select
function, and the new
data frame (formerly new_table
) is the first argument of the filter
function.
Note that the pipe works well with functions where the first argument is the input data. Functions in tidyverse packages like dplyr have this format and can be used easily with the pipe.