We could change the number of trees grown by randomForest() using the ntree argument:

bag.boston <- randomForest(medv ~ ., data = Boston, subset = train, mtry = 13, ntree = 25)
yhat.bag <- predict(bag.boston, newdata = Boston[-train,])
mean((yhat.bag - boston.test)^2)
[1] 23.66716

Growing a random forest proceeds in exactly the same way, except that we use a smaller value of the mtry argument. By default, randomForest() uses \(p/3\) variables when building a random forest of regression trees, and \(\sqrt p\) variables when building a random forest of classification trees. Here we use mtry = 6.

set.seed(1)
rf.boston <- randomForest(medv ~ ., data = Boston, subset = train, mtry = 6, importance = TRUE)
yhat.rf <- predict(rf.boston, newdata = Boston[-train,])
mean((yhat.rf - boston.test)^2)
[1] 19.62021

The test set MSE is 19.62; this indicates that random forests yielded an improvement over bagging in this case.

Questions

With the mtcars data set, sample the indices for the training set and store them in train (70-30 split).
Train a random forest model on the training set with mpg as the independent variable and all other variables as dependent variables.
- Set the number of trees on 100.
- Set the number of variables per tree on 4
- Set the argument importance to TRUE
- Store the model in fit.
Calculate the MSE on the test set and store the result in test.mse.

Assume that:

The randomForest library has been loaded
The mtcars data set has been loaded and attached