This problem involves the OJ data set which is part of the ISLR2 package.

We still use the training and test set from the previous exercise.

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. Apply the cv.tree() function to the training set in order to determine the optimal size tree. You should minimize the classification error (i.e., not deviance). Produce a plot with tree size on the \(x\)-axis and cross-validated classification error rate on the \(y\)-axis. Which tree size corresponds to the lowest cross-validated classification error rate? Store this value in tree.size. (Depending on your version of R, it might be possible that the cv.tree function is unable to find the training set. Add model = TRUE in the tree() function when creating tree.oj from the previous exercise if you would encounter this error.)

  2. Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes.

  3. Compare the training error rates between the pruned and unpruned trees. Calculate the test error rate for the pruned tree and store it in prunedtree.testerror. Note that it is possible that the pruning process increased the test error rate! However, it produced a way more interpretable tree.


Assume that: