Given these predictions,
the table()
function can be used to produce a confusion matrix in order to determine how many
observations were correctly or incorrectly classified.
> table(glm.pred, Direction)
Direction
glm.pred Down Up
Up 457 507
Down 145 141
> (507+145) /1250
[1] 0.5216
> mean(glm.pred == Direction)
[1] 0.5216
The diagonal elements of the confusion matrix indicate correct predictions,
while the off-diagonals represent incorrect predictions. Hence our model
correctly predicted that the market would go up on 507 days and that
it would go down on 145 days, for a total of 507 + 145 = 652 correct
predictions. The mean()
function can be used to compute the fraction of
days for which the prediction was correct (accuracy). In this case, logistic regression
correctly predicted the movement of the market 52.2% of the time.
At first glance, it appears that the logistic regression model is working
a little better than random guessing. However, this result is misleading
because we trained and tested the model on the same set of 1, 250 observations.
In other words, 100 - 52.2 = 47.8% is the training error rate.
Perform these same steps for the cut-off value of 0.6 and see whether that performs better or worse:
Assume that:
ISLR2
library has been loadedSmarket
dataset has been loaded and attached