We saw that the cv.glm()
function can be used in order to compute the LOOCV test error estimate.
Alternatively, one could compute those quantities using just the glm()
and predict.glm()
functions, and a for loop.
You will now take this approach in order to compute the LOOCV error for a simple logistic regression model on the Weekly
data set.
Recall that in the context of classification problems, the LOOCV error is given by
Before we begin, convert the dependent variable to a numeric vector.
Weekly$Direction <- as.numeric(Weekly$Direction == "Up")
Also, take a subset of the first 200 observations to avoid computational problems on Dodona.
Weekly <- Weekly[1:200, ]
Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.
Fit a logistic regression model that predicts Direction
using Lag1
and Lag2
.
Store the model in glm.fit1
.
Fit a logistic regression model that predicts Direction
using Lag1
and Lag2
using all but the first observation.
You will need to adjust the subset
parameter of the function. Store the model in glm.fit2
.
Use the model from question 2 to predict the direction of the first observation. You can do this by predicting that the
first observation will go up if \(P(\mbox{direction = Up} | Lag1,Lag2 ) > 0.5\). Was this observation correctly classified?
Store the prediction in glm.pred2
and whether the prediction is correct in glm.correct2
.
Make sure that the predicted class label glm.pred2
is numeric (1
or 0
), not boolean (TRUE
or FALSE
).
Write a loop from \(i = 1\) to \(i = n\), where \(n\) is the number of observations in the data set,
that performs each of the following steps :
Fit a logistic regression model using all but the ith observation to predict Direction
using Lag1
and Lag2
.
Store the model in glm.fit3
.
Compute the posterior probability of the market moving up for the ith observation. Store this in glm.prob3
.
Use the posterior probability for the ith observation in order to predict whether the market moves up. Store this in glm.pred3
.
Determine whether an error was made in predicting the direction for the ith observation.
If an error was made, then indicate this as a 1, and otherwise indicate it as a 0.
Store this in the i’th element of the vector error
. Don’t forget to first create this empty vector outside the for loop.
Notice that the variables
glm.fit3
,glm.prob3
, andglm.pred3
are overwritten in each iteration. Anyway, we specify names to avoid that you copy the names from question 3 and overwrite your solution.
Take the average of the \(n\) numbers obtained in error
in order to obtain the LOOCV estimate for the test error.
Store the result in the variable loocv
.
Assume that:
ISLR2
library has been loadedWeekly
dataset has been loaded and attached