In this lab, we explore the resampling techniques covered in this chapter. Some of the commands in this lab may take a while to run on your computer.
We explore the use of the validation set approach in order to estimate the
test error rates that result from fitting various linear models on the Auto
data set.
Before we begin, we use the set.seed()
function in order to set a seed for
R’s random number generator, so that you will obtain
precisely the same results as those shown below. It is generally a good idea
to set a random seed when performing an analysis such as cross-validation
that contains an element of randomness, so that the results obtained can
be reproduced precisely at a later time.
We begin by using the sample()
function to split the set of observations into two halves,
by selecting a random subset of 196 observations out of
the original 392 observations. We refer to these observations as the training
set.
library(ISLR2)
set.seed(1)
train <- sample(392, 196)
(Here we use a shortcut in the sample command; see ?sample
for details.)
We then use the subset
option in lm()
to fit a linear regression using only
the observations corresponding to the training set.
lm.fit <- lm(mpg ~ horsepower, data = Auto, subset = train)
Below, we have set a different seed.
This will impact the training sample you will receive from the same code.
Try performing the same steps as illustrated above and use the summary()
function to see the differences in both models.
Assume that:
ISLR2
library has been loadedAuto
dataset has been loaded and attached