In this section we use the BART package, and within it the gbart() function, to fit a Bayesian additive regression tree model to the Boston housing data set. The gbart() function is designed for quantitative outcome variables. For binary outcomes, lbart() and pbart() are available.

To run the gbart() function, we must first create matrices of predictors for the training and test data. We run BART with default settings.

library(BART)
x <- Boston[, 1:12]
y <- Boston[, "medv"]
xtrain <- x[train, ]
ytrain <- y[train]
xtest <- x[-train, ]
ytest <- y[-train]
set.seed (1)
bartfit <- gbart(xtrain , ytrain , x.test = xtest)

Next we compute the test error.

> yhat.bart <- bartfit$yhat.test.mean
> mean((ytest - yhat.bart)^2)
[1] 12.14

On this data set, the test error of BART is lower than the test error of random forests and boosting.

Now we can check how many times each variable appeared in the collection of trees.

> ord <- order(bartfit$varcount.mean , decreasing = T)
> bartfit$varcount.mean[ord]
    nox     rad     tax    chas   lstat ptratio
 23.546  22.679  22.272  21.762  21.631  21.041
    age      rm      zn   indus     dis    crim 
 20.851  18.454  17.685  17.000  16.505   7.365

Questions

Assume that: