The cv.glm() function can also be used to implement k-fold CV. Below we use k = 10, a common choice for k, on the Auto data set. We once again set a random seed and initialize a vector in which we will store the CV errors corresponding to the polynomial fits of orders one to ten.

> set.seed(17)
> cv.error.10 <- rep(0 ,10)
> for (i in 1:10) {
+   glm.fit <- glm(mpg ~ poly(horsepower, i), data = Auto)
+   cv.error.10[i] <- cv.glm(Auto, glm.fit, K = 10)$delta[1]
+ }
> cv.error.10
 [1] 24.27207 19.26909 19.34805 19.29496 19.03198 18.89781 19.12061 19.14666
 [9] 18.87013 20.95520

Notice that the computation time is much shorter than that of LOOCV. (In principle, the computation time for LOOCV for a least squares linear model should be faster than for k-fold CV, due to the availability of this formula:

\[CV_{(n)} = \frac{1}{n}\sum_{i=1}^{n}(\frac{y_i - \hat y_i}{1-h_i})^2\]

for LOOCV; however, unfortunately the cv.glm() function does not make use of this formula.) We still see little evidence that using cubic or higher-order polynomial terms leads to lower test error than simply using a quadratic fit. We saw in Section LOOCV 1 that the two numbers associated with delta are essentially the same when LOOCV is performed. When we instead perform k-fold CV, then the two numbers associated with delta differ slightly. The first is the standard k-fold CV estimate, as in:

\[CV_{(k)} = \frac{1}{k}\sum_{i=1}^{k}{MSE}_i\]

The second is a bias-corrected version. On this data set, the two estimates are very similar to each other.

Questions

Using the Boston dataset:


Assume that: