We just saw that it is possible to choose among a set of models of different
sizes using
In order to use the validation set approach, we begin by splitting the
observations into a training set and a test set. We do this by creating
a random vector, train
, of elements equal to TRUE
if the corresponding
observation is in the training set, and FALSE
otherwise. The vector test has
a TRUE
if the observation is in the test set, and a FALSE
otherwise. Note the
! in the command to create test
causes TRUE
s to be switched to FALSE
s and
vice versa. We also set a random seed so that the user will obtain the same
training set/test set split.
set.seed(1)
train <- sample(c(TRUE, FALSE), nrow(Hitters), rep = TRUE)
test <- (!train)
Now, we apply regsubsets()
to the training set in order to perform best
subset selection.
regfit.best <- regsubsets(Salary ~ ., data = Hitters[train,], nvmax = 19)
Try creating a training and test set for the Boston
dataset and store it in train
and test
respectively.
Use this training set in the regsubsets()
function with medv
as the response and all other variables as predictors and store it in regfit.best
.
Use 13 for the nvmax
parameter.
Assume that:
MASS
and leaps
libraries have been loadedBoston
dataset has been loaded and attached