Conditional expressions are one of the basic features of programming. They are used for what is called flow control. The most common conditional expression is the if-else statement. In R, we can actually perform quite a bit of data analysis without conditionals. However, they do come up occasionally, and you will need them once you start writing your own functions and packages.
Here is a very simple example showing the general structure of an
if-else statement. The basic idea is to print the reciprocal of a
unless a
is 0:
a <- 0
if(a!=0){
print(1/a)
} else{
print("No reciprocal for 0.")
}
#> [1] "No reciprocal for 0."
Let’s look at one more example using the US murders data frame:
library(dslabs)
data(murders)
murder_rate <- murders$total / murders$population*100000
Here is a very simple example that tells us which states, if any, have a
murder rate lower than 0.5 per 100,000. The if
statement protects us
from the case in which no state satisfies the condition.
ind <- which.min(murder_rate)
if(murder_rate[ind] < 0.5){
print(murders$state[ind])
} else{
print("No state has murder rate that low")
}
#> [1] "Vermont"
If we try it again with a rate of 0.25, we get a different answer:
if(murder_rate[ind] < 0.25){
print(murders$state[ind])
} else{
print("No state has a murder rate that low.")
}
#> [1] "No state has a murder rate that low."
A related function that is very useful is ifelse
. This function takes
three arguments: a logical and two possible answers. If the logical is
TRUE
, the value in the second argument is returned and if FALSE
, the
value in the third argument is returned. Here is an example:
a <- 0
ifelse(a > 0, 1/a, NA)
#> [1] NA
The function is particularly useful because it works on vectors. It
examines each entry of the logical vector and returns elements from the
vector provided in the second argument, if the entry is TRUE
, or
elements from the vector provided in the third argument, if the entry is
FALSE
.
a <- c(0, 1, 2, -4, 5)
result <- ifelse(a > 0, 1/a, NA)
This table helps us see what happened:
a | is_a_positive | answer1 | answer2 | result |
---|---|---|---|---|
0 | FALSE | Inf | NA | NA |
1 | TRUE | 1.00 | NA | 1.0 |
2 | TRUE | 0.50 | NA | 0.5 |
-4 | FALSE | -0.25 | NA | NA |
5 | TRUE | 0.20 | NA | 0.2 |
Here is an example of how this function can be readily used to replace all the missing values in a vector with zeros:
data(na_example)
no_nas <- ifelse(is.na(na_example), 0, na_example)
sum(is.na(no_nas))
#> [1] 0
Two other useful functions are any
and all
. The any
function takes
a vector of logicals and returns TRUE
if any of the entries is TRUE
.
The all
function takes a vector of logicals and returns TRUE
if all
of the entries are TRUE
. Here is an example:
z <- c(TRUE, TRUE, FALSE)
any(z)
#> [1] TRUE
all(z)
#> [1] FALSE