This exercise builds upon previous exercise. Start by redefining murder to include rate and rank:

library(dplyr)
library(dslabs)
data(murders)
murders <- mutate(murders, rate=total/population*100000, rank=rank(-rate))

The dplyr function filter is used to choose specific rows of the data frame to keep. Unlike select which is for columns, filter is for rows.

Example

# row(s) where the statename is "New York"
new_york <- filter(murders, state == "New York")

# rows where the statename is shorter than 10
short_states <- filter(murders, length(state) < 10)

# even rows
even_rows <- filter(murders, c(1:51)%%2==0)

# rows with population in millions larger than 11
odd_rows <- filter(murders, population/1000000 > 11)

We typically use variables from the dataframe to define the filter criterium, but you can use any logical vector with a length that is a divisor of the number of rows in the dataframe.

Exercise

Let the examples above inspire you to solve the following questions:

  1. Use filter to show the entries with a murder rate lower then 1. Store your result in safe_cities.

  2. Create a new data frame called no_south that removes states from the “South” region. You can use the != operator, which does the opposite of the == operator used in the first example.

  3. We can also use %in% to filter with dplyr. Create a new data frame called murders_ne_w with only the states from the Northeast and West regions.

    Hint

    The dplyr %in% operator checks for each element in the first vector if this element is contained in the second vector, here is an example: c("Mango", "Apple", "Pears", "Banana", "Mango") %in% c("Mango", "Banana") returns c(TRUE, FALSE, FALSE, TRUE, TRUE). You can use the result of this operation as filter condition.