In this lab, we explore the resampling techniques covered in this chapter. Some of the commands in this lab may take a while to run on your computer.

We explore the use of the validation set approach in order to estimate the test error rates that result from fitting various linear models on the Auto data set. Before we begin, we use the set.seed() function in order to set a seed for R’s random number generator, so that you will obtain precisely the same results as those shown below. It is generally a good idea to set a random seed when performing an analysis such as cross-validation that contains an element of randomness, so that the results obtained can be reproduced precisely at a later time. We begin by using the sample() function to split the set of observations into two halves, by selecting a random subset of 196 observations out of the original 392 observations. We refer to these observations as the training set.

library(ISLR2)
set.seed(1)
train <- sample(392, 196)

(Here we use a shortcut in the sample command; see ?sample for details.) We then use the subset option in lm() to fit a linear regression using only the observations corresponding to the training set.

lm.fit <- lm(mpg ~ horsepower, data = Auto, subset = train)

Below, we have set a different seed. This will impact the training sample you will receive from the same code. Try performing the same steps as illustrated above and use the summary() function to see the differences in both models.

Assume that:

The ISLR2 library has been loaded
The Auto dataset has been loaded and attached