This question should be answered using the Weekly data set, which is part of the ISLR2 package. This data is similar in nature to the Smarket data from this chapter’s lab, except that it contains 1089 weekly returns for 21 years, from the beginning of 1990 to the end of 2010.

plot

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. Before starting this exercise, have a look at the dataset. Explore the column names with names(), look at some statistics of the data with summary(), or any other method you prefer. Perhaps you can even create plots to find any patterns in the data.

  2. Use the full data set to perform a logistic regression with Direction as the response and the five lag variables plus Volume as predictors. Store the model in glm.fit.

    • MC1:
      Use the summary() function to print the results. Are any of the predictors statistically significant (\(p < 0.05\))? If so, which ones?
      • 1: All of the variables are significant
      • 2: Only the intercept is significant
      • 3: Only the intercept and Lag1 are significant
      • 4: Only the intercept and Lag2 are significant
      • 5: Only the intercept, Lag1 and Lag2 are significant
      • 6: None of the variables are significant

  3. Compute the confusion matrix and the accuracy of the model. Use a threshold of 0.5 to classify predicted probabilities as Up or Down (character vector). Store the confusion matrix in glm.table and the accuracy in glm.acc.
    (Notice that because we fit the model and check the performance on the full dataset, this referred to as the train accuracy.)

    • MC2:
      Which statement is correct?:
      • 1: The model classifies most of the observations as Up
      • 2: The model classifies most of the observations as Down
      • 3: The model classifies observations as Up or Down equally often

    • MC3:
      For weeks when the market goes up, what percentage of the time did the model make a correct prediction?
      • 1: 11.16%
      • 2: 56.11%
      • 3: 92.07%


Assume that: