In order to properly evaluate the performance of a classification tree on
these data, we must estimate the test error rather than simply computing
the training error. We split the observations into a training set and a test
set, build the tree using the training set, and evaluate its performance on
the test data. The predict()
function can be used for this purpose. In the
case of a classification tree, the argument type="class"
instructs R
to return
the actual class prediction. This approach leads to correct predictions for
68% of the locations in the test data set.
set.seed(3)
train <- sample(1:nrow(Carseats), 200)
Carseats.test <- Carseats[-train,]
High.test <- High[-train]
tree.carseats <- tree(High ~ . - Sales, Carseats, subset = train)
tree.pred <- predict(tree.carseats, Carseats.test, type = "class")
table(tree.pred, High.test)
High.test
tree.pred No Yes
No 88 41
Yes 23 48
(88 + 48) / 200
[1] 0.68
OJ
data set. Use a seed value of 1.
Store the numeric vector with training data indices in train.idx
.Purchase
as dependent variable and all other variables as independent variables.
Store the model in the variable tree.oj
.tree.pred
.cf.test
.table()
function, and the ground truth as the second argument).acc.test
.sum()
and diag()
functions on the confusion matrix)Assume that: