The pipe %>%
can be used to perform operations sequentially
without having to define intermediate objects.
So far multiple dataframe operations were done independant from each other by storing inbetween results in variables:
library(dplyr)
library(dslabs)
data(murders)
murders <- mutate(murders, rate = total / population * 100000, rank = rank(-rate))
my_states <- select(murders, state, rate, rank)
The pipe %>%
permits us to perform both operations sequentially
without having to define an intermediate variable. We
therefore could have mutated and selected in the same line like this:
library(dplyr)
library(dslabs)
data(murders)
my_states <- murders %>%
mutate(rate = total / population * 100000, rank = rank(-rate)) %>%
# TODO: Insert a filter function to filter on states from the Northeast or West region who have a murderrate lower than 1
select(state, rate, rank)
Pitfall
Notice that
mutate
andselect
no longer have a data frame as the first argument. The first argument is assumed to be the result of the operation conducted right before the%>%
.
state
, rate
and rank
for all states located in the Northeast or West region that have a murder rate lower than 1. The dataframe should be stored in my_states
(as in the example code).