We could change the number of trees grown by randomForest()
using the ntree
argument:
bag.boston <- randomForest(medv ~ ., data = Boston, subset = train, mtry = 13, ntree = 25)
yhat.bag <- predict(bag.boston, newdata = Boston[-train,])
mean((yhat.bag - boston.test)^2)
[1] 23.66716
Growing a random forest proceeds in exactly the same way, except that
we use a smaller value of the mtry
argument. By default, randomForest()
uses \(p/3\) variables when building a random forest of regression trees, and
\(\sqrt p\) variables when building a random forest of classification trees. Here we
use mtry = 6.
set.seed(1)
rf.boston <- randomForest(medv ~ ., data = Boston, subset = train, mtry = 6, importance = TRUE)
yhat.rf <- predict(rf.boston, newdata = Boston[-train,])
mean((yhat.rf - boston.test)^2)
[1] 19.62021
The test set MSE is 19.62; this indicates that random forests yielded an improvement over bagging in this case.
train
(70-30 split).mpg
as the independent variable and all other variables as dependent variables.
importance
to TRUE
fit
.test.mse
.Assume that: