We now use the predict() function to estimate the response for all 392 observations, and we use the mean() function to calculate the \(MSE\) of the 196 observations in the validation set. Note that the -train index below selects only the observations that are not in the training set.

> attach(Auto)
> mean((mpg - predict(lm.fit, Auto))[-train]^2)
[1] 23.26601

Therefore, the estimated test \(MSE\) for the linear regression fit is 23.27. We can use the poly() function to estimate the test error for the polynomial and cubic regressions.

> lm.fit2 <- lm(mpg ~ poly(horsepower, 2), data = Auto, subset = train)
> mean((mpg - predict(lm.fit2, Auto))[-train]^2)
[1] 18.71646
> lm.fit3 <- lm(mpg ~ poly(horsepower, 3), data = Auto, subset = train)
> mean((mpg - predict(lm.fit3, Auto))[-train]^2)
[1] 18.79401

These error rates are 18.72 and 18.79, respectively.

Adding a third order term did not seem to improve the model. Maybe adding another variable might. Try adding the year variable to lm.fit2 and calculate the \(MSE\) of the new model:

Hint: You could use the update() function here! (If you forgot how, take a look at this1 exercise)


Assume that: