We now use the predict()
function to estimate the response for all 392
observations, and we use the mean()
function to calculate the \(MSE\) of the
196 observations in the validation set. Note that the -train
index below
selects only the observations that are not in the training set.
> attach(Auto)
> mean((mpg - predict(lm.fit, Auto))[-train]^2)
[1] 23.26601
Therefore, the estimated test \(MSE\) for the linear regression fit is 23.27. We
can use the poly()
function to estimate the test error for the polynomial
and cubic regressions.
> lm.fit2 <- lm(mpg ~ poly(horsepower, 2), data = Auto, subset = train)
> mean((mpg - predict(lm.fit2, Auto))[-train]^2)
[1] 18.71646
> lm.fit3 <- lm(mpg ~ poly(horsepower, 3), data = Auto, subset = train)
> mean((mpg - predict(lm.fit3, Auto))[-train]^2)
[1] 18.79401
These error rates are 18.72 and 18.79, respectively.
Adding a third order term did not seem to improve the model. Maybe adding another variable might. Try adding the year variable to lm.fit2 and calculate the \(MSE\) of the new model:
Hint: You could use the
update()
function here! (If you forgot how, take a look at this1 exercise)
Assume that:
ISLR2
library has been loadedAuto
dataset has been loaded and attached