In general, coercion is an attempt by R to be flexible with data types. When an entry does not match the expected, some of the prebuilt R functions try to guess what was meant before throwing an error. This can also lead to confusion. Failing to understand coercion can drive programmers crazy when attempting to code in R since it behaves quite differently from most other languages in this regard. Let’s learn about it with some examples.
We said that vectors must be all of the same type. So if we try to combine, say, numbers and characters, you might expect an error:
x <- c(1, "canada", 3)
But we don’t get one, not even a warning! What happened? Look at x
and its class:
x
#> [1] "1" "canada" "3"
class(x)
#> [1] "character"
R coerced the data into characters. It guessed that because you put a
character string in the vector, you meant the 1 and 3 to actually be
character strings "1"
and “3
”. The fact that not even a warning is
issued is an example of how coercion can cause many unnoticed errors in
R.
R also offers functions to change from one type to another. For example, you can turn numbers into characters with:
x <- 1:5
y <- as.character(x)
y
#> [1] "1" "2" "3" "4" "5"
You can turn it back with as.numeric
:
as.numeric(y)
#> [1] 1 2 3 4 5
This function is actually quite useful since datasets that include numbers as character strings are common.
When a function tries to coerce one type to another and encounters an
impossible case, it usually gives us a warning and turns the entry into
a special value called an NA
for “not available”. For example:
x <- c("1", "b", "3")
as.numeric(x)
#> Warning: NAs introduced by coercion
#> [1] 1 NA 3
R does not have any guesses for what number you want when you type b
,
so it does not try.
As a data scientist you will encounter the NA
’s often as they are
generally used for missing data, a common problem in real-world
datasets.