In this section we introduce the main tidyverse data importing functions. We will use the murders.csv file provided by the dslabs package as an example. To simplify the illustration we will copy the file to our working directory using the following code:

filename <- "murders.csv"
dir <- system.file("extdata", package = "dslabs") 
fullpath <- file.path(dir, filename)
file.copy(fullpath, "murders.csv")

readr

The readr library includes functions for reading data stored in text file spreadsheets into R. readr is part of the tidyverse package, or you can load it directly:

library(readr)

The following functions are available to read-in spreadsheets:

Function Format Typical suffix
read_table white space separated values txt
read_csv comma separated values csv
read_csv2 semicolon separated values csv
read_tsv tab delimited separated values tsv
read_delim general text file format, must define delimiter txt

Although the suffix usually tells us what type of file it is, there is no guarantee that these always match. We can open the file to take a look or use the function read_lines to look at a few lines:

read_lines("murders.csv", n_max = 3)
#> [1] "state,abb,region,population,total"
#> [2] "Alabama,AL,South,4779736,135"     
#> [3] "Alaska,AK,West,710231,19"

This also shows that there is a header. Now we are ready to read-in the data into R. From the .csv suffix and the peek at the file, we know to use read_csv:

dat <- read_csv(filename)
#> Parsed with column specification:
#> cols(
#>   state = col_character(),
#>   abb = col_character(),
#>   region = col_character(),
#>   population = col_double(),
#>   total = col_double()
#> )

Note that we receive a message letting us know what data types were used for each column. Also note that dat is a tibble, not just a data frame. This is because read_csv is a tidyverse parser. We can confirm that the data has in fact been read-in with:

View(dat)

Finally, note that we can also use the full path for the file:

dat <- read_csv(fullpath)

readxl

You can load the readxl package using

library(readxl)

The package provides functions to read-in Microsoft Excel formats:

Function Format Typical suffix
read_excel auto detect the format xls, xlsx
read_xls original format xls
read_xlsx new format xlsx

The Microsoft Excel formats permit you to have more than one spreadsheet in one file. These are referred to as sheets. The functions listed above read the first sheet by default, but we can also read the others. The excel_sheets function gives us the names of all the sheets in an Excel file. These names can then be passed to the sheet argument in the three functions above to read sheets other than the first.