In this section we use the BART package, and within it the gbart()
function,
to fit a Bayesian additive regression tree model to the Boston housing data
set. The gbart()
function is designed for quantitative outcome variables.
For binary outcomes, lbart()
and pbart()
are available.
To run the gbart()
function, we must first create matrices of predictors
for the training and test data. We run BART with default settings.
library(BART)
x <- Boston[, 1:12]
y <- Boston[, "medv"]
xtrain <- x[train, ]
ytrain <- y[train]
xtest <- x[-train, ]
ytest <- y[-train]
set.seed (1)
bartfit <- gbart(xtrain , ytrain , x.test = xtest)
Next we compute the test error.
> yhat.bart <- bartfit$yhat.test.mean
> mean((ytest - yhat.bart)^2)
[1] 12.14
On this data set, the test error of BART is lower than the test error of random forests and boosting.
Now we can check how many times each variable appeared in the collection of trees.
> ord <- order(bartfit$varcount.mean , decreasing = T)
> bartfit$varcount.mean[ord]
nox rad tax chas lstat ptratio
23.546 22.679 22.272 21.762 21.631 21.041
age rm zn indus dis crim
20.851 18.454 17.685 17.000 16.505 7.365
train
.Salary
as the dependent variable and all other variables except NewLeague
as the independent variables.xtrain
, ytrain
, xtest
, and ytest
, respectively.bartfit
.bart.mse
.Assume that:
ISLR2
and BART
libraries have been loadedHitters
data set has been loaded and attached