We saw that the cv.glm() function can be used in order to compute the LOOCV test error estimate. Alternatively, one could compute those quantities using just the glm() and predict.glm() functions, and a for loop. You will now take this approach in order to compute the LOOCV error for a simple logistic regression model on the Weekly data set. Recall that in the context of classification problems, the LOOCV error is given by

\[\textrm{CV}_{(n)} = \frac{1}{n} \sum_{i=1}^{n} \textrm{Err}_i\]

Before we begin, convert the dependent variable to a numeric vector.

Weekly$Direction <- as.numeric(Weekly$Direction == "Up")

Also, take a subset of the first 200 observations to avoid computational problems on Dodona.

Weekly <- Weekly[1:200, ]

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. Fit a logistic regression model that predicts Direction using Lag1 and Lag2. Store the model in glm.fit1.

  2. Fit a logistic regression model that predicts Direction using Lag1 and Lag2 using all but the first observation. You will need to adjust the subset parameter of the function. Store the model in glm.fit2.

  3. Use the model from question 2 to predict the direction of the first observation. You can do this by predicting that the first observation will go up if \(P(\mbox{direction = Up} | Lag1,Lag2 ) > 0.5\). Was this observation correctly classified? Store the prediction in glm.pred2 and whether the prediction is correct in glm.correct2. Make sure that the predicted class label glm.pred2 is numeric (1 or 0), not boolean (TRUE or FALSE).

  4. Write a loop from \(i = 1\) to \(i = n\), where \(n\) is the number of observations in the data set, that performs each of the following steps :

    1. Fit a logistic regression model using all but the ith observation to predict Direction using Lag1 and Lag2. Store the model in glm.fit3.

    2. Compute the posterior probability of the market moving up for the ith observation. Store this in glm.prob3.

    3. Use the posterior probability for the ith observation in order to predict whether the market moves up. Store this in glm.pred3.

    4. Determine whether an error was made in predicting the direction for the ith observation. If an error was made, then indicate this as a 1, and otherwise indicate it as a 0. Store this in the i’th element of the vector error. Don’t forget to first create this empty vector outside the for loop.

    Notice that the variables glm.fit3, glm.prob3, and glm.pred3 are overwritten in each iteration. Anyway, we specify names to avoid that you copy the names from question 3 and overwrite your solution.

  5. Take the average of the \(n\) numbers obtained in error in order to obtain the LOOCV estimate for the test error. Store the result in the variable loocv.


Assume that: