Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

Create a logical vector train that is TRUE for all the observation from 1990 to 2008. (1990 and 2008 included)
Create a logical vector test that is TRUE for all the observation from 2009 to 2010. (2009 and 2010 included)
Create a hold out dataset Weekly.test that only contains the test observations (dependent + independent variables).
Create a hold out dependent variable Direction.test that only contains the test observations.

Follow the steps:
1. Fit the logistic regression model using a training data period from 1990 to 2008, with Lag2 as the only predictor. Store the model in glm.fit.
2. Compute the confusion matrix and the accuracy of the model for the hold out test data (that is, the data from 2009 and 2010). Use a threshold of 0.5 to classify predicted probabilities as Up or Down (character vector). Store the confusion matrix in glm.table and the accuracy in glm.acc.
Repeat question 4 using LDA. Store the model in lda.fit, the confusion matrix in lda.table and the accuracy in lda.acc.
Repeat question 4 using QDA. Store the model in qda.fit, the confusion matrix in qda.table and the accuracy in qda.acc.
Repeat question 4 using Naïve Bayes. Store the model in nb.fit, the confusion matrix in nb.table and the accuracy in nb.acc.
Repeat question 4 using KNN with \(K = 1\). Use a seed value of 1. Store the model in knn.pred, the confusion matrix in knn.table and the accuracy in knn.acc.
Recall that the KNN function knn() requires four inputs:
1. A matrix containing the predictors associated with the training data, labeled train.X
2. A matrix containing the predictors associated with the data for which we wish to make predictions, labeled test.X.
3. A vector containing the class labels for the training observations, labeled train.Direction.
4. A value for \(K\), the number of nearest neighbors to be used by the classifier.
Inspect the results of the 4 models.
- MC1:
  Which of these methods provides the best results on the hold out test data?
  - 1: Logistic regression
  - 2: LDA
  - 3: QDA
  - 4: KNN
  - 5: Naïve Bayes
  - 6: Logistic regression and LDA, they perform equally well
  - 7: QDA and KNN, they perform equally well

Assume that:

The ISLR2, MASS, e1071, and class libraries have been loaded
The Weekly dataset has been loaded and attached