In this section we use the BART package, and within it the gbart() function,
to fit a Bayesian additive regression tree model to the Boston housing data
set. The gbart() function is designed for quantitative outcome variables.
For binary outcomes, lbart() and pbart() are available.
To run the gbart() function, we must first create matrices of predictors
for the training and test data. We run BART with default settings.
library(BART)
x <- Boston[, 1:12]
y <- Boston[, "medv"]
xtrain <- x[train, ]
ytrain <- y[train]
xtest <- x[-train, ]
ytest <- y[-train]
set.seed (1)
bartfit <- gbart(xtrain , ytrain , x.test = xtest)
Next we compute the test error.
> yhat.bart <- bartfit$yhat.test.mean
> mean((ytest - yhat.bart)^2)
[1] 12.14
On this data set, the test error of BART is lower than the test error of random forests and boosting.
Now we can check how many times each variable appeared in the collection of trees.
> ord <- order(bartfit$varcount.mean , decreasing = T)
> bartfit$varcount.mean[ord]
nox rad tax chas lstat ptratio
23.546 22.679 22.272 21.762 21.631 21.041
age rm zn indus dis crim
20.851 18.454 17.685 17.000 16.505 7.365
Given a training and test set defined for the Hitter data set:
set.seed(1)
Hitters <- na.omit(Hitters)
train <- sample(seq_len(nrow(Hitters)), nrow(Hitters) / 2)
x <- Hitters[, 1:18]
y <- Hitters[, "Salary"]
xtrain <- x[train, ]
ytrain <- y[train]
xtest <- x[-train, ]
ytest <- y[-train]
Salary as the dependent variable and all other variables except NewLeague as the independent variables.bartfit.bart.mse.Give only the output of bart.mse, rounded to two digits after the comma. Do not load or run any other code. For example:
bart.mse <- 19.98