In R, the most basic objects available to store data are vectors. As we have seen, complex datasets can usually be broken down into components that are vectors. For example, in a data frame, each column is a vector. Here we learn more about this important class.

Creating vectors

We can create vectors using the function c, which stands for concatenate. We use c to concatenate entries in the following way:

codes <- c(380, 124, 818)
codes
#> [1] 380 124 818

We can also create character vectors. We use the quotes to denote that the entries are characters rather than variable names.

country <- c("italy", "canada", "egypt")

In R you can also use single quotes:

country <- c('italy', 'canada', 'egypt')

But be careful not to confuse the single quote ’ with the back quote `.

By now you should know that if you type:

country <- c(italy, canada, egypt)

you receive an error because the variables italy, canada, and egypt are not defined. If we do not use the quotes, R looks for variables with those names and returns an error.

Names

Sometimes it is useful to name the entries of a vector. For example, when defining a vector of country codes, we can use the names to connect the two:

codes <- c(italy = 380, canada = 124, egypt = 818)
codes
#>  italy canada  egypt
#>    380    124    818

The object codes continues to be a numeric vector:

class(codes)
#> [1] "numeric"

but with names:

names(codes)
#> [1] "italy"  "canada" "egypt"

If the use of strings without quotes looks confusing, know that you can use the quotes as well:

codes <- c("italy" = 380, "canada" = 124, "egypt" = 818)
codes
#>  italy canada  egypt
#>    380    124    818

There is no difference between this function call and the previous one. This is one of the many ways in which R is quirky compared to other languages.

We can also assign names using the names functions:

codes <- c(380, 124, 818)
country <- c("italy","canada","egypt")
names(codes) <- country
codes
#>  italy canada  egypt
#>    380    124    818

Sequences

Another useful function for creating vectors generates sequences:

seq(1, 10)
#>  [1]  1  2  3  4  5  6  7  8  9 10

The first argument defines the start, and the second defines the end which is included. The default is to go up in increments of 1, but a third argument lets us tell it how much to jump by:

seq(1, 10, 2)
#> [1] 1 3 5 7 9

If we want consecutive integers, we can use the following shorthand:

1:10
#>  [1]  1  2  3  4  5  6  7  8  9 10

When we use these functions, R produces integers, not numerics, because they are typically used to index something:

class(1:10)
#> [1] "integer"

However, if we create a sequence including non-integers, the class changes:

class(seq(1, 10, 0.5))
#> [1] "numeric"

Subsetting

We use square brackets to access specific elements of a vector. For the vector codes we defined above, we can access the second element using:

codes[2]
#> canada
#>    124

You can get more than one entry by using a multi-entry vector as an index:

codes[c(1,3)]
#> italy egypt
#>   380   818

The sequences defined above are particularly useful if we want to access, say, the first two elements:

codes[1:2]
#>  italy canada
#>    380    124

If the elements have names, we can also access the entries using these names. Below are two examples.

codes["canada"]
#> canada
#>    124
codes[c("egypt","italy")]
#> egypt italy
#>   818   380