Now that we have mastered some basic R knowledge, let’s try to gain some insights into the safety of different states in the context of gun murders.

sort

Say we want to rank the states from least to most gun murders. The function sort sorts a vector in increasing order. We can therefore see the largest number of gun murders by typing:

library(dslabs)
data(murders)
sort(murders$total)
#>  [1]    2    4    5    5    7    8   11   12   12   16   19   21   22
#> [14]   27   32   36   38   53   63   65   67   84   93   93   97   97
#> [27]   99  111  116  118  120  135  142  207  219  232  246  250  286
#> [40]  293  310  321  351  364  376  413  457  517  669  805 1257

However, this does not give us information about which states have which murder totals. For example, we don’t know which state had 1257.

order

The function order is closer to what we want. It takes a vector as input and returns the vector of indexes that sorts the input vector. This may sound confusing so let’s look at a simple example. We can create a vector and sort it:

x <- c(31, 4, 15, 92, 65)
sort(x)
#> [1]  4 15 31 65 92

Rather than sort the input vector, the function order returns the index that sorts input vector:

index <- order(x)
x[index]
#> [1]  4 15 31 65 92

This is the same output as that returned by sort(x). If we look at this index, we see why it works:

x
#> [1] 31  4 15 92 65
order(x)
#> [1] 2 3 1 5 4

The second entry of x is the smallest, so order(x) starts with 2. The next smallest is the third entry, so the second entry is 3 and so on.

How does this help us order the states by murders? First, remember that the entries of vectors you access with $ follow the same order as the rows in the table. For example, these two vectors containing state names and abbreviations, respectively, are matched by their order:

murders$state[1:6]
#> [1] "Alabama"    "Alaska"     "Arizona"    "Arkansas"   "California"
#> [6] "Colorado"
murders$abb[1:6]
#> [1] "AL" "AK" "AZ" "AR" "CA" "CO"

This means we can order the state names by their total murders. We first obtain the index that orders the vectors according to murder totals and then index the state names vector:

ind <- order(murders$total) 
murders$abb[ind] 
#>  [1] "VT" "ND" "NH" "WY" "HI" "SD" "ME" "ID" "MT" "RI" "AK" "IA" "UT"
#> [14] "WV" "NE" "OR" "DE" "MN" "KS" "CO" "NM" "NV" "AR" "WA" "CT" "WI"
#> [27] "DC" "OK" "KY" "MA" "MS" "AL" "IN" "SC" "TN" "AZ" "NJ" "VA" "NC"
#> [40] "MD" "OH" "MO" "LA" "IL" "GA" "MI" "PA" "NY" "FL" "TX" "CA"

According to the above, California had the most murders.

max and which.max

If we are only interested in the entry with the largest value, we can use max for the value:

max(murders$total)
#> [1] 1257

and which.max for the index of the largest value:

i_max <- which.max(murders$total)
murders$state[i_max]
#> [1] "California"

For the minimum, we can use min and which.min in the same way.

Does this mean California is the most dangerous state? In an upcoming section, we argue that we should be considering rates instead of totals. Before doing that, we introduce one last order-related function: rank.

rank

Although not as frequently used as order and sort, the function rank is also related to order and can be useful. For any given vector it returns a vector with the rank of the first entry, second entry, etc., of the input vector. Here is a simple example:

x <- c(31, 4, 15, 92, 65)
rank(x)
#> [1] 3 1 2 5 4

To summarize, let’s look at the results of the three functions we have introduced:

original sort order rank
31 4 2 3
4 15 3 1
15 31 1 2
92 65 5 5
65 92 4 4

Beware of recycling

Another common source of unnoticed errors in R is the use of recycling. We saw that vectors are added elementwise. So if the vectors don’t match in length, it is natural to assume that we should get an error. But we don’t. Notice what happens:

x <- c(1,2,3)
y <- c(10, 20, 30, 40, 50, 60, 70)
x+y
#> Warning in x + y: longer object length is not a multiple of shorter
#> object length
#> [1] 11 22 33 41 52 63 71

We do get a warning, but no error. For the output, R has recycled the numbers in x. Notice the last digit of numbers in the output.