Given these predictions, the table() function can be used to produce a confusion matrix in order to determine how many observations were correctly or incorrectly classified.

> table(glm.pred, Direction)
        Direction
glm.pred Down  Up
     Up   457 507
    Down  145 141
> (507+145) /1250
[1] 0.5216
> mean(glm.pred == Direction)
[1] 0.5216

The diagonal elements of the confusion matrix indicate correct predictions, while the off-diagonals represent incorrect predictions. Hence our model correctly predicted that the market would go up on 507 days and that it would go down on 145 days, for a total of 507 + 145 = 652 correct predictions. The mean() function can be used to compute the fraction of days for which the prediction was correct (accuracy). In this case, logistic regression correctly predicted the movement of the market 52.2% of the time. At first glance, it appears that the logistic regression model is working a little better than random guessing. However, this result is misleading because we trained and tested the model on the same set of 1, 250 observations. In other words, 100 - 52.2 = 47.8% is the training error rate.

Perform these same steps for the cut-off value of 0.6 and see whether that performs better or worse:


Assume that: