We recall that the logistic regression model had very underwhelming p-values associated with all of the predictors,
and that the smallest p-value,
though not very small, corresponded to Lag1. Perhaps by removing the
variables that appear not to be helpful in predicting Direction, we can
obtain a more effective model. After all, using predictors that have no
relationship with the response tends to cause a deterioration in the test
error rate (since such predictors cause an increase in variance without a
corresponding decrease in bias), and so removing such predictors may in
turn yield an improvement. Below we have refit the logistic regression using
just Lag1 and Lag2, which seemed to have the highest predictive power in
the original logistic regression model.
> glm.fit <- glm(Direction ~ Lag1 + Lag2,
data = Smarket, family = binomial, subset = train)
> glm.probs <- predict(glm.fit, Smarket.2005, type = "response")
> glm.pred <- rep("Down", 252)
> glm.pred[glm.probs > .5] <- "Up"
> table(glm.pred, Direction.2005)
Direction .2005
glm.pred Down Up
Down 35 35
Up 76 106
> mean(glm.pred == Direction.2005)
[1] 0.56
> 106 / (106 + 76)
[1] 0.582
Now the results appear to be more promising: 56 % of the daily movements have been correctly predicted. The confusion matrix suggests that on days when logistic regression predicts that the market will decline, it is only correct 50 % of the time. However, on days when it predicts an increase in the market, it has a 58 % accuracy rate. Suppose that we want to predict the returns associated with particular values of Lag1 and Lag2. In particular, we want to predict Direction on a day when Lag1 and Lag2 equal 1.2 and 1.1, respectively, and on a day when they equal 1.5 and −0.8. We do this using the predict() function.
> predict(glm.fit, newdata = data.frame(Lag1 = c(1.2, 1.5), Lag2 = c(1.1, -0.8)),
type = "response")
1 2
0.4791462 0.4960939
Try predicting the returns on a day when Lag1 and Lag2 equal -0.8 and 1.3, respectively, and on a day when they equal 1.2 and -0.6:
Note: Be mindful of the order of arguments for the
newdataparameter
Assume that:
ISLR2 library has been loadedSmarket dataset has been loaded and attached