The first step when importing data from a spreadsheet is to locate the file containing the data. Although we do not recommend it, you can use an approach similar to what you do to open files in Microsoft Excel by clicking on the RStudio “File” menu, clicking “Import Dataset”, then clicking through folders until you find the file. We want to be able to write code rather than use the point-and-click approach.
The main challenge in this first step is that we need to let the R functions know where to look for the file containing the data. The simplest way to do this is to have a copy of the file in the folder in which the importing functions look by default. Once we do this, all we have to supply to the importing function is the filename.
You can think of your computer’s filesystem as a series of nested folders, each containing other folders and files. Data scientists refer to folders as directories. We refer to the folder that contains all other folders as the root directory. We refer to the directory in which we are currently located as the working directory. The working directory therefore changes as you move through folders: think of it as your current location.
The path of a file is a list of directory names that can be thought of as instructions on what folders to click on, and in what order, to find the file. If these instructions are for finding the file from the root directory we refer to it as the full path. If the instructions are for finding the file starting in the working directory we refer to it as a relative path.
To see an example of a full path on your system type the following:
system.file(package = "dslabs")
The strings separated by slashes are the directory names. The first
slash represents the root directory and we know this is a full path
because it starts with a slash. If the first directory name appears
without a slash in front, then the path is assumed to be relative. We
can use the function list.files
to see examples of relative paths.
dir <- system.file(package = "dslabs")
list.files(path = dir)
#> [1] "data" "DESCRIPTION" "extdata" "help"
#> [5] "html" "INDEX" "Meta" "NAMESPACE"
#> [9] "R" "script"
These relative paths give us the location of the files or directories if
we start in the directory with the full path. For example, the full path
to the help
directory in the example above is
/Library/Frameworks/R.framework/Versions/3.5/Resources/library/dslabs/help
.
We highly recommend only writing relative paths in your code. The reason
is that full paths are unique to your computer and you want your code to
be portable. You can get the full path of your working directory without
writing out explicitly by using the getwd
function.
wd <- getwd()
If you need to change your working directory, you can use the function
setwd
or you can change it through RStudio by clicking on “Session”.