In this section we use the BART package, and within it the gbart() function, to fit a Bayesian additive regression tree model to the Boston housing data set. The gbart() function is designed for quantitative outcome variables. For binary outcomes, lbart() and pbart() are available.

To run the gbart() function, we must first create matrices of predictors for the training and test data. We run BART with default settings.

library(BART)
x <- Boston[, 1:12]
y <- Boston[, "medv"]
xtrain <- x[train, ]
ytrain <- y[train]
xtest <- x[-train, ]
ytest <- y[-train]
set.seed (1)
bartfit <- gbart(xtrain , ytrain , x.test = xtest)

Next we compute the test error.

> yhat.bart <- bartfit$yhat.test.mean
> mean((ytest - yhat.bart)^2)
[1] 12.14

On this data set, the test error of BART is lower than the test error of random forests and boosting.

Now we can check how many times each variable appeared in the collection of trees.

> ord <- order(bartfit$varcount.mean , decreasing = T)
> bartfit$varcount.mean[ord]
    nox     rad     tax    chas   lstat ptratio
 23.546  22.679  22.272  21.762  21.631  21.041
    age      rm      zn   indus     dis    crim 
 20.851  18.454  17.685  17.000  16.505   7.365

Questions

Below a training set is defined for the Hitter data set, stored in the variable train.
Build a BART with Salary as the dependent variable and all other variables except NewLeague as the independent variables.
Store the matrices of predictors and dependent variable for the training and test data in xtrain, ytrain, xtest, and ytest, respectively.
Store the model in the variable bartfit.
Store the MSE of the model in bart.mse.

Assume that:

The ISLR2 and BART libraries have been loaded
The Hitters data set has been loaded and attached