We recall that the logistic regression model had very underwhelming p-values associated with all of the predictors, and that the smallest p-value, though not very small, corresponded to Lag1. Perhaps by removing the variables that appear not to be helpful in predicting Direction, we can obtain a more effective model. After all, using predictors that have no relationship with the response tends to cause a deterioration in the test error rate (since such predictors cause an increase in variance without a corresponding decrease in bias), and so removing such predictors may in turn yield an improvement. Below we have refit the logistic regression using just Lag1 and Lag2, which seemed to have the highest predictive power in the original logistic regression model.

> glm.fit <- glm(Direction ~ Lag1 + Lag2, 
data = Smarket, family = binomial, subset = train)
> glm.probs <- predict(glm.fit, Smarket.2005, type = "response")
> glm.pred <- rep("Down", 252)
> glm.pred[glm.probs > .5] <- "Up"
> table(glm.pred, Direction.2005)
Direction .2005
glm.pred Down Up
    Down 35   35
    Up   76   106
> mean(glm.pred == Direction.2005)
[1] 0.56
> 106 / (106 + 76)
[1] 0.582

Now the results appear to be more promising: 56 % of the daily movements have been correctly predicted. The confusion matrix suggests that on days when logistic regression predicts that the market will decline, it is only correct 50 % of the time. However, on days when it predicts an increase in the market, it has a 58 % accuracy rate. Suppose that we want to predict the returns associated with particular values of Lag1 and Lag2. In particular, we want to predict Direction on a day when Lag1 and Lag2 equal 1.2 and 1.1, respectively, and on a day when they equal 1.5 and −0.8. We do this using the predict() function.

> predict(glm.fit, newdata = data.frame(Lag1 = c(1.2, 1.5), Lag2 = c(1.1, -0.8)),
 type = "response")
        1         2 
0.4791462 0.4960939 

Try predicting the returns on a day when Lag1 and Lag2 equal -0.8 and 1.3, respectively, and on a day when they equal 1.2 and -0.6:

Note: Be mindful of the order of arguments for the newdata parameter


Assume that: