R-base also provides import functions. These have similar names to those in the tidyverse, for example read.table, read.csv and read.delim. However, there are a couple of important differences. To show this we read-in the data with an R-base function:

dat2 <- read.csv(filename)

An important difference is that the characters are converted to factors:

class(dat2$abb)
#> [1] "factor"
class(dat2$region)
#> [1] "factor"

This can be avoided by setting the argument stringsAsFactors to FALSE.

dat <- read.csv("murders.csv", stringsAsFactors = FALSE)
class(dat$state)
#> [1] "character"

In our experience this can be a cause for confusion since a variable that was saved as characters in file is converted to factors regardless of what the variable represents. In fact, we highly recommend setting stringsAsFactors=FALSE to be your default approach when using the R-base parsers. You can easily convert the desired columns to factors after importing data.

`scan`

When reading in spreadsheets many things can go wrong. The file might have a multiline header, be missing cells, or it might use an unexpected encoding¹. We recommend you read this post about common issues found here: the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses²

With experience you will learn how to deal with different challenges. Carefully reading the help files for the functions discussed here will be useful. With scan you can read-in each cell of a file. Here is an example:

path <- system.file("extdata", package = "dslabs")
filename <- "murders.csv"
x <- scan(file.path(path, filename), sep=",", what = "c")
x[1:10]
#>  [1] "state"      "abb"        "region"     "population" "total"     
#>  [6] "Alabama"    "AL"         "South"      "4779736"    "135"

Note that the tidyverse provides read_lines, a similarly useful function.