In order to properly evaluate the performance of a classification tree on these data, we must estimate the test error rather than simply computing the training error. We split the observations into a training set and a test set, build the tree using the training set, and evaluate its performance on the test data. The predict() function can be used for this purpose. In the case of a classification tree, the argument type="class" instructs R to return the actual class prediction. This approach leads to correct predictions for 68% of the locations in the test data set.

set.seed(3)
train <- sample(1:nrow(Carseats), 200)
Carseats.test <- Carseats[-train,]
High.test <- High[-train]
tree.carseats <- tree(High ~ . - Sales, Carseats, subset = train)
tree.pred <- predict(tree.carseats, Carseats.test, type = "class")
table(tree.pred, High.test)
         High.test
tree.pred No Yes
      No  88  41
      Yes 23  48
      
(88 + 48) / 200
[1] 0.68

Questions

Create a 70-30 train-test split of the OJ data set. Use a seed value of 1. Store the numeric vector with training data indices in train.idx.
Fit a classification tree on the training data with Purchase as dependent variable and all other variables as independent variables. Store the model in the variable tree.oj.
Use the tree to make class predictions on the test set. Store the predictions in the variable tree.pred.
Calculate the confusion matrix. Store the result in the variable cf.test.
(note: give the predictions as the first argument in the table() function, and the ground truth as the second argument).
Finally, calculate the test set accuracy. Store the result in the variable acc.test.
(note: try to avoid hardcoding and use e.g. the sum() and diag() functions on the confusion matrix)

Assume that:

The ISLR2 and tree libraries have been loaded
The OJ dataset has been loaded and attached