R provides a powerful and convenient way of indexing vectors. We can, for example, subset a vector based on properties of another vector. In this section, we continue working with our US murders example, which we can load like this:
library(dslabs)
data("murders")
We have now calculated the murder rate using:
murder_rate <- murders$total / murders$population * 100000
Imagine you are moving from Italy where, according to an ABC news report, the murder rate is only 0.71 per 100,000. You would prefer to move to a state with a similar murder rate. Another powerful feature of R is that we can use logicals to index vectors. If we compare a vector to a single number, it actually performs the test for each entry. The following is an example related to the question above:
ind <- murder_rate < 0.71
If we instead want to know if a value is less or equal, we can use:
ind <- murder_rate <= 0.71
Note that we get back a logical vector with TRUE for each entry
smaller than or equal to 0.71. To see which states these are, we can
leverage the fact that vectors can be indexed with logicals.
murders$state[ind]
#> [1] "Hawaii" "Iowa" "New Hampshire" "North Dakota"
#> [5] "Vermont"
In order to count how many are TRUE, the function sum returns the sum
of the entries of a vector and logical vectors get coerced to numeric
with TRUE coded as 1 and FALSE as 0. Thus we can count the states
using:
sum(ind)
#> [1] 5
Suppose we like the mountains and we want to move to a safe state in the
western region of the country. We want the murder rate to be at most 1.
In this case, we want two different things to be true. Here we can use
the logical operator and, which in R is represented with &. This
operation results in TRUE only when both logicals are TRUE. To see
this, consider this example:
TRUE & TRUE
#> [1] TRUE
TRUE & FALSE
#> [1] FALSE
FALSE & FALSE
#> [1] FALSE
For our example, we can form two logicals:
west <- murders$region == "West"
safe <- murder_rate <= 1
and we can use the & to get a vector of logicals that tells us which
states satisfy both conditions:
ind <- safe & west
murders$state[ind]
#> [1] "Hawaii" "Idaho" "Oregon" "Utah" "Wyoming"
whichSuppose we want to look up California’s murder rate. For this type of
operation, it is convenient to convert vectors of logicals into indexes
instead of keeping long vectors of logicals. The function which tells
us which entries of a logical vector are TRUE. So we can type:
ind <- which(murders$state == "California")
murder_rate[ind]
#> [1] 3.37
matchIf instead of just one state we want to find out the murder rates for
several states, say New York, Florida, and Texas, we can use the
function match. This function tells us which indexes of a second
vector match each of the entries of a first vector:
ind <- match(c("New York", "Florida", "Texas"), murders$state)
ind
#> [1] 33 10 44
Now we can look at the murder rates:
murder_rate[ind]
#> [1] 2.67 3.40 3.20
%in%If rather than an index we want a logical that tells us whether or not
each element of a first vector is in a second, we can use the function
%in%. Let’s imagine you are not sure if Boston, Dakota, and Washington
are states. You can find out like this:
c("Boston", "Dakota", "Washington") %in% murders$state
#> [1] FALSE FALSE TRUE
Note that we will be using %in% often throughout the book.
Advanced: There is a connection between match and %in% through
which. To see this, notice that the following two lines produce the
same index (although in different order):
match(c("New York", "Florida", "Texas"), murders$state)
#> [1] 33 10 44
which(murders$state%in%c("New York", "Florida", "Texas"))
#> [1] 10 33 44