This exercise builds upon previous exercise so you don’t have to import
any libraries or datasets. We have made an object murders_expanded
which
contains the murders dataset with the extra rate- and rank-columns from
previous exercise. Don’t use murders
!
4. The dplyr function filter
is used to choose specific rows of
the data frame to keep. Unlike select
which is for columns, filter
is for rows. For example, you can show just the New York row like this:
filter(murders_expanded, state == "New York")
You can use other logical vectors to filter rows.
Use filter
to show the top 5 entries with the highest murder rates.
After we add murder rate and rank (We have already done this step for you).
Remember that you can filter based on the rank
column. Store your result
in deadly_cities
5. We can remove rows using the !=
operator. For example, to remove
Florida, we would do this:
no_florida <- filter(murders_expanded, state != "Florida")
Create a new data frame called no_south
that removes states from the
South region. How many states are in this category? You can use the
function nrow
for this. Store the number in nr_no_south
.
6. We can also use %in%
to filter with dplyr. You can therefore
see the data from New York and Texas like this:
filter(murders_expanded, state %in% c("New York", "Texas"))
Create a new data frame called murders_ne_w
with only the states from
the Northeast and the West. How many states are in this category? Store
the number in nr_murders_ne_w
.