This exercise builds upon previous exercise. Start by redefining murder to include rate and rank:
library(dplyr)
library(dslabs)
data(murders)
murders <- mutate(murders, rate=total/population*100000, rank=rank(-rate))
The dplyr function filter
is used to choose specific rows of
the data frame to keep. Unlike select
which is for columns, filter
is for rows.
# row(s) where the statename is "New York"
new_york <- filter(murders, state == "New York")
# rows where the statename is shorter than 10
short_states <- filter(murders, length(state) < 10)
# even rows
even_rows <- filter(murders, c(1:51)%%2==0)
# rows with population in millions larger than 11
odd_rows <- filter(murders, population/1000000 > 11)
We typically use variables from the dataframe to define the filter criterium, but you can use any logical vector with a length that is a divisor of the number of rows in the dataframe.
Let the examples above inspire you to solve the following questions:
Use filter
to show the entries with a murder rate lower then 1. Store your result in safe_cities
.
Create a new data frame called no_south
that removes states from the “South” region. You can use the !=
operator, which does the opposite of the ==
operator used in the first example.
We can also use %in%
to filter with dplyr. Create a new data frame called murders_ne_w
with only the states from the Northeast
and West
regions.
Hint
The dplyr
%in%
operator checks for each element in the first vector if this element is contained in the second vector, here is an example:c("Mango", "Apple", "Pears", "Banana", "Mango") %in% c("Mango", "Banana")
returnsc(TRUE, FALSE, FALSE, TRUE, TRUE)
. You can use the result of this operation as filter condition.