The tree library is used to construct classification and regression trees.
library(tree)
We first use classification trees to analyze the Carseats data set. In these
data, Sales is a continuous variable, and so we begin by recoding it as a
binary variable. We use the ifelse() function to create a variable, called
High, which takes on a value of Yes if the Sales variable exceeds 8, and
takes on a value of No otherwise.
library(ISLR2)
attach(Carseats)
High <- factor(ifelse(Sales <= 8, "No", "Yes"))
Finally, we use the data.frame() function to merge High with the rest of
the Carseats data.
Carseats <- data.frame(Carseats, High)
We now use the tree() function to fit a classification tree in order to predict
High using all variables but Sales. The syntax of the tree() function is quite
similar to that of the lm() function.
tree.carseats <- tree(High ~ . - Sales, Carseats)
The summary() function lists the variables that are used as internal nodes
in the tree, the number of terminal nodes, and the (training) error rate.
summary(tree.carseats)
Classification tree:
tree(formula = High ~ . - Sales, data = Carseats)
Variables actually used in tree construction:
[1] "ShelveLoc"   "Price"       "Income"      "CompPrice"   "Population"  "Advertising" "Age"         "US"         
Number of terminal nodes:  27 
Residual mean deviance:  0.4575 = 170.7 / 373 
Misclassification error rate: 0.09 = 36 / 400
We see that the training error rate is 9%. For classification trees, the deviance
reported in the output of summary() is given by
where \(n_{mk}\) is the number of observations in the \(m\)th terminal node that belong to the \(k\)th class. A small deviance indicates a tree that provides a good fit to the (training) data. The residual mean deviance reported is simply the deviance divided by \(n - | T_0 |\), which in this case is 400−27 = 373.
For this and the following exercises, we use the OJ dataset from the ISLR2 library. The dataset contains sales information where the customer either purchased Citrus Hill or Minute Maid Orange Juice.
Purchase as dependent variable and all other variables as independent variables.
Store the model in the variable tree.oj.Assume that:
ISLR2 library has been loadedtree library has been installed and loadedOJ dataset has been loaded and attached