Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. Follow the steps:
    1. Fit the logistic regression model using a training data period from 1990 to 2008, with Lag2 as the only predictor. Store the model in glm.fit.
    2. Compute the confusion matrix and the accuracy of the model for the hold out test data (that is, the data from 2009 and 2010). Use a threshold of 0.5 to classify predicted probabilities as Up or Down (character vector). Store the confusion matrix in glm.table and the accuracy in glm.acc.

  2. Repeat question 4 using LDA. Store the model in lda.fit, the confusion matrix in lda.table and the accuracy in lda.acc.

  3. Repeat question 4 using QDA. Store the model in qda.fit, the confusion matrix in qda.table and the accuracy in qda.acc.

  4. Repeat question 4 using Naïve Bayes. Store the model in nb.fit, the confusion matrix in nb.table and the accuracy in nb.acc. (Use the naiveBayes() function from the e1071 package.)

  5. Repeat question 4 using KNN with \(K = 1\). Use a seed value of 1. Store the model in knn.pred, the confusion matrix in knn.table and the accuracy in knn.acc.

    Recall that the KNN function knn() requires four inputs:

    1. A matrix containing the predictors associated with the training data, labeled train.X
    2. A matrix containing the predictors associated with the data for which we wish to make predictions, labeled test.X.
    3. A vector containing the class labels for the training observations, labeled train.Direction.
    4. A value for \(K\), the number of nearest neighbors to be used by the classifier.

  6. Inspect the results of the 4 models.

    • MC1:
      Which of these methods provides the best results on the hold out test data?
      • 1: Logistic regression
      • 2: LDA
      • 3: QDA
      • 4: KNN
      • 5: Naïve Bayes
      • 6: Logistic regression and LDA, they perform equally well
      • 7: QDA and KNN, they perform equally well


Assume that: