When we use the read_csv
function from the tidyverse library to read
each of the csv files that the following code saves in the filenames
object we notice that the olive.csv
file gives us a warning. This
is because the first line of the file is missing the header for the
first column.
path <- system.file("extdata", package = "dslabs")
filenames <- list.files(path)
filenames
#[1] "2010_bigfive_regents.xls"
#[2] "carbon_emissions.csv"
#[3] "fertility-two-countries-example.csv"
#[4] "HRlist2.txt"
#[5] "life-expectancy-and-fertility-two-countries-example.csv"
#[6] "murders.csv"
#[7] "olive.csv"
#[8] "RD-Mortality-Report_2015-18-180531.pdf"
#[9] "ssa-death-probability.csv"
dat <- read_csv(file.path(path, filenames[7]))
Parsed with column specification:
# cols(
# X1 = col_double(),
# Region = col_character(),
# Area = col_double(),
# palmitic = col_double(),
# palmitoleic = col_double(),
# stearic = col_double(),
# oleic = col_double(),
# linoleic = col_double(),
# linolenic = col_double(),
# arachidic = col_double(),
# eicosenoic = col_double()
# )
# Warning: 572 parsing failures.
# row col expected actual file
# 1 -- 11 columns 12 columns '.../dslabs/extdata/olive.csv'
# 2 -- 11 columns 12 columns '.../dslabs/extdata/olive.csv'
# 3 -- 11 columns 12 columns '.../dslabs/extdata/olive.csv'
# 4 -- 11 columns 12 columns '.../dslabs/extdata/olive.csv'
# 5 -- 11 columns 12 columns '.../dslabs/extdata/olive.csv'
# ... ... .......... .......... ..............................
# See problems(...) for more details.
1.Read the help file for read_csv
to figure out how to read in the
olive.csv
file without reading this header. If you skip the header,
you should not get this warning. Save the result to an object called
dat
.
Hint: use the skip
and col_names
options. Use the help function
in RStudio to figure out how to use them.
2. A problem with the previous approach is that we don’t know what the columns represent. Type:
names(dat)
to see that the names are not informative.
Use the readLines
function to read in just the first line (we later
learn how to extract values from the output). Store your result in
header_line
Clarification:
The first lines of your script should be:
library(readr)
path <- system.file("extdata", package = "dslabs")
filenames <- list.files(path)
dat <- read_csv(file.path(path, filenames[7]))
You will need to add the skip
and col_names
arguments to read_csv
.
readLines
should be used with file.path(path, filenames[7])
, not on dat
.