When we use the read_csv function from the tidyverse library to read each of the csv files that the following code saves in the filenames object we notice that the olive.csv file gives us a warning. This is because the first line of the file is missing the header for the first column.

path <- system.file("extdata", package = "dslabs")
filenames <- list.files(path)
filenames
#[1] "2010_bigfive_regents.xls"                               
#[2] "carbon_emissions.csv"                                   
#[3] "fertility-two-countries-example.csv"                    
#[4] "HRlist2.txt"                                            
#[5] "life-expectancy-and-fertility-two-countries-example.csv"
#[6] "murders.csv"                                            
#[7] "olive.csv"                                              
#[8] "RD-Mortality-Report_2015-18-180531.pdf"                 
#[9] "ssa-death-probability.csv" 

dat <- read_csv(file.path(path, filenames[7]))
Parsed with column specification:
# cols(
#   X1 = col_double(),
#   Region = col_character(),
#   Area = col_double(),
#   palmitic = col_double(),
#   palmitoleic = col_double(),
#   stearic = col_double(),
#   oleic = col_double(),
#   linoleic = col_double(),
#   linolenic = col_double(),
#   arachidic = col_double(),
#   eicosenoic = col_double()
# )
# Warning: 572 parsing failures.
# row col   expected actual     file
#   1  -- 11 columns 12 columns '.../dslabs/extdata/olive.csv'
#   2  -- 11 columns 12 columns '.../dslabs/extdata/olive.csv'
#   3  -- 11 columns 12 columns '.../dslabs/extdata/olive.csv'
#   4  -- 11 columns 12 columns '.../dslabs/extdata/olive.csv'
#   5  -- 11 columns 12 columns '.../dslabs/extdata/olive.csv'
# ... ... .......... .......... ..............................
# See problems(...) for more details.

1.Read the help file for read_csv to figure out how to read in the olive.csv file without reading this header. If you skip the header, you should not get this warning. Save the result to an object called dat.

Hint: use the skip and col_names options. Use the help function in RStudio to figure out how to use them.

2. A problem with the previous approach is that we don’t know what the columns represent. Type:

names(dat)

to see that the names are not informative.

Use the readLines function to read in just the first line (we later learn how to extract values from the output). Store your result in header_line

Clarification:

The first lines of your script should be:

library(readr)
path <- system.file("extdata", package = "dslabs")
filenames <- list.files(path)
dat <- read_csv(file.path(path, filenames[7]))

You will need to add the skip and col_names arguments to read_csv.

readLines should be used with file.path(path, filenames[7]), not on dat.