We recall that the logistic regression model had very underwhelming p-values associated with all of the predictors,
and that the smallest p-value,
though not very small, corresponded to Lag1
. Perhaps by removing the
variables that appear not to be helpful in predicting Direction
, we can
obtain a more effective model. After all, using predictors that have no
relationship with the response tends to cause a deterioration in the test
error rate (since such predictors cause an increase in variance without a
corresponding decrease in bias), and so removing such predictors may in
turn yield an improvement. Below we have refit the logistic regression using
just Lag1
and Lag2
, which seemed to have the highest predictive power in
the original logistic regression model.
> glm.fit <- glm(Direction ~ Lag1 + Lag2,
data = Smarket, family = binomial, subset = train)
> glm.probs <- predict(glm.fit, Smarket.2005, type = "response")
> glm.pred <- rep("Down", 252)
> glm.pred[glm.probs > .5] <- "Up"
> table(glm.pred, Direction.2005)
Direction .2005
glm.pred Down Up
Down 35 35
Up 76 106
> mean(glm.pred == Direction.2005)
[1] 0.56
> 106 / (106 + 76)
[1] 0.582
Now the results appear to be more promising: 56 % of the daily movements have been correctly predicted. The confusion matrix suggests that on days when logistic regression predicts that the market will decline, it is only correct 50 % of the time. However, on days when it predicts an increase in the market, it has a 58 % accuracy rate. Suppose that we want to predict the returns associated with particular values of Lag1 and Lag2. In particular, we want to predict Direction on a day when Lag1 and Lag2 equal 1.2 and 1.1, respectively, and on a day when they equal 1.5 and −0.8. We do this using the predict() function.
> predict(glm.fit, newdata = data.frame(Lag1 = c(1.2, 1.5), Lag2 = c(1.1, -0.8)),
type = "response")
1 2
0.4791462 0.4960939
Try predicting the returns on a day when Lag1 and Lag2 equal -0.8 and 1.3, respectively, and on a day when they equal 1.2 and -0.6:
Note: Be mindful of the order of arguments for the
newdata
parameter
Assume that:
ISLR2
library has been loadedSmarket
dataset has been loaded and attached