This question should be answered using the Weekly
data set, which is part of the ISLR2
package.
This data is similar in nature to the Smarket
data from this chapter’s lab, except that it contains 1089 weekly returns for 21 years, from the beginning of 1990 to the end of 2010.
Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.
Before starting this exercise, have a look at the dataset.
Explore the column names with names()
, look at some statistics of the data with summary()
, or any other method you prefer.
Perhaps you can even create plots to find any patterns in the data.
Use the full data set to perform a logistic regression with
Direction
as the response and the five lag variables plus Volume
as predictors. Store the model in glm.fit
.
summary()
function to print the results. Are any of the predictors statistically significant (\(p < 0.05\))? If so,
which ones?
Lag1
are significantLag2
are significantLag1
and Lag2
are significantCompute the confusion matrix and the accuracy of the model. Use a threshold of 0.5 to classify predicted probabilities as Up or Down (character vector).
Store the confusion matrix in glm.table
and the accuracy in glm.acc
.
(Notice that because we fit the model and check the performance on the full dataset, this referred to as the train accuracy.)
Assume that:
ISLR2
library has been loadedWeekly
dataset has been loaded and attached