This problem involves the OJ data set which is part of the ISLR2 package.

We first create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

set.seed(1)
train <- sample(1:nrow(OJ), 800)
OJ.train <- OJ[train, ]
OJ.test <- OJ[-train, ]

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. Fit a tree to the training data, with Purchase as the response and the other variables as predictors. Set a seed value of 2 before running the model. Use the summary() function to produce summary statistics about the tree, and describe the results obtained. Store the misclassification (training) error rate in tree.misclass and the number of terminal nodes in tree.size. (Access the necessary values in your summary object with $).

  2. Type in the name of the tree object in order to get a detailed text output. Pick one of the terminal nodes, and interpret the information displayed.
    • MC2:
      Which of the following statements is correct?
      • 1: The node labelled with ‘8)’ is a terminal node and most of the observations in this branch are of the class MM.
      • 2: The node labelled with ‘8)’ is not a terminal node and most of the observations in this branch are of the class MM.
      • 3: The node labelled with ‘8)’ is a terminal node and most of the observations in this branch are of the class CH.
      • 4: The node labelled with ‘8)’ is not a terminal node and most of the observations in this branch are of the class CH.

  3. Create a plot of the tree, and interpret the results.
    • MC3:
      The most important indicator of Purchase seems to be:
      • 1: PctDiscMM
      • 2: PriceDiff
      • 3: SpecialCH
      • 4: LoyalCH

  4. Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. Store the test error rate in tree.testerror.

Assume that: