Here we fit a regression tree to the Boston data set. First, we create a training set, and fit the tree to the training data.

train <- sample(1:nrow(Boston), nrow(Boston) / 2) <- tree(medv ~ ., Boston, subset = train)

Regression tree:
tree(formula = medv ~ ., data = Boston, subset = train)
Variables actually used in tree construction:
[1] "rm"    "lstat" "dis"  
Number of terminal nodes:  8 
Residual mean deviance:  15.16 = 3713 / 245 
Distribution of residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-18.3800  -2.3390  -0.1132   0.0000   2.2770  14.3600  

Notice that the output of summary() indicates that only three of the variables have been used in constructing the tree. In the context of a regression tree, the deviance is simply the sum of squared errors for the tree. We now plot the tree.

text(, pretty = 0)


The variable lstat measures the percentage of individuals with lower socioeconomic status. The tree indicates that lower values of lstat correspond to more expensive houses. The tree predicts a median house price of $35,640 for larger homes in suburbs in which residents have high socioeconomic status (rm>=6.924, rm<7.3935 and lstat<8.845).


