In R, the most basic objects available to store data are vectors. As we have seen, complex datasets can usually be broken down into components that are vectors. For example, in a data frame, each column is a vector. Here we learn more about this important class.
We can create vectors using the function c
, which stands for
concatenate. We use c
to concatenate entries in the following way:
codes <- c(380, 124, 818)
codes
#> [1] 380 124 818
We can also create character vectors. We use the quotes to denote that the entries are characters rather than variable names.
country <- c("italy", "canada", "egypt")
In R you can also use single quotes:
country <- c('italy', 'canada', 'egypt')
But be careful not to confuse the single quote ’ with the back quote `.
By now you should know that if you type:
country <- c(italy, canada, egypt)
you receive an error because the variables italy
, canada
, and
egypt
are not defined. If we do not use the quotes, R looks for
variables with those names and returns an error.
Sometimes it is useful to name the entries of a vector. For example, when defining a vector of country codes, we can use the names to connect the two:
codes <- c(italy = 380, canada = 124, egypt = 818)
codes
#> italy canada egypt
#> 380 124 818
The object codes
continues to be a numeric vector:
class(codes)
#> [1] "numeric"
but with names:
names(codes)
#> [1] "italy" "canada" "egypt"
If the use of strings without quotes looks confusing, know that you can use the quotes as well:
codes <- c("italy" = 380, "canada" = 124, "egypt" = 818)
codes
#> italy canada egypt
#> 380 124 818
There is no difference between this function call and the previous one. This is one of the many ways in which R is quirky compared to other languages.
We can also assign names using the names
functions:
codes <- c(380, 124, 818)
country <- c("italy","canada","egypt")
names(codes) <- country
codes
#> italy canada egypt
#> 380 124 818
Another useful function for creating vectors generates sequences:
seq(1, 10)
#> [1] 1 2 3 4 5 6 7 8 9 10
The first argument defines the start, and the second defines the end which is included. The default is to go up in increments of 1, but a third argument lets us tell it how much to jump by:
seq(1, 10, 2)
#> [1] 1 3 5 7 9
If we want consecutive integers, we can use the following shorthand:
1:10
#> [1] 1 2 3 4 5 6 7 8 9 10
When we use these functions, R produces integers, not numerics, because they are typically used to index something:
class(1:10)
#> [1] "integer"
However, if we create a sequence including non-integers, the class changes:
class(seq(1, 10, 0.5))
#> [1] "numeric"
We use square brackets to access specific elements of a vector. For the
vector codes
we defined above, we can access the second element using:
codes[2]
#> canada
#> 124
You can get more than one entry by using a multi-entry vector as an index:
codes[c(1,3)]
#> italy egypt
#> 380 818
The sequences defined above are particularly useful if we want to access, say, the first two elements:
codes[1:2]
#> italy canada
#> 380 124
If the elements have names, we can also access the entries using these names. Below are two examples.
codes["canada"]
#> canada
#> 124
codes[c("egypt","italy")]
#> egypt italy
#> 818 380