dplyr pipe operator

dplyr pipe operator

With dplyr we can perform a series of operations, for example select and then filter, by sending the results of one function to another using what is called the pipe operator: %>% (the pipe operator is from the “magrittr” package). Some details are included below.

We wrote code above to show three variables (state, region, rate) for states that have murder rates below 0.71. To do this, we defined the intermediate object new_table. In dplyr we can write code that looks more like a description of what we want to do without intermediate objects:

\[\mbox{original data } \rightarrow \mbox{ select } \rightarrow \mbox{ filter }\]

For such an operation, we can use the pipe %>%. The code looks like this:

murders %>% select(state, region, rate) %>% filter(rate <= 0.71)
#>           state        region  rate
#> 1        Hawaii          West 0.515
#> 2          Iowa North Central 0.689
#> 3 New Hampshire     Northeast 0.380
#> 4  North Dakota North Central 0.595
#> 5       Vermont     Northeast 0.320

This line of code is equivalent to the two lines of code above. What is going on here?

In general, the pipe sends the result of the left side of the pipe to be the first argument of the function on the right side of the pipe. Here is a very simple example:

16 %>% sqrt()
#> [1] 4

We can continue to pipe values along:

16 %>% sqrt() %>% log2()
#> [1] 2

The above statement is equivalent to log2(sqrt(16)).

Remember that the pipe sends values to the first argument, so we can define other arguments as if the first argument is already defined:

16 %>% sqrt() %>% log(base = 2)
#> [1] 2

Therefore, when using the pipe with data frames and dplyr, we no longer need to specify the required first argument since the dplyr functions we have described all take the data as the first argument. In the code we wrote:

murders %>% select(state, region, rate) %>% filter(rate <= 0.71)

murders is the first argument of the select function, and the new data frame (formerly new_table) is the first argument of the filter function.

Note that the pipe works well with functions where the first argument is the input data. Functions in tidyverse packages like dplyr have this format and can be used easily with the pipe.

Other pipes from the magrittr package can be found on the reference page 1. We will not be using these in the exercises, but they can be useful. R also has a native pipe |>, we will not be using this in the exercises as its behavior is not exactly the same as the %>% operator.