We now compare boosting, bagging, and random forests with more ‘simple’ regression methods.

We will use the Weekly data set from the ISLR2 package and try to predict the Direction variable using the Lag1-5 and Volume variables. We create a training and test set as follows:

set.seed(1)
Weekly$Direction <- ifelse(Weekly$Direction == "Up", 1, 0)
Weekly$Year <- NULL
Weekly$Today <- NULL
train <- sample(nrow(Weekly), nrow(Weekly)*.7)
Weekly.train <- Weekly[train, ]
Weekly.test <- Weekly[-train, ]

Questions

  1. Build a logistic regression model. Build a confusion matrix using the test set, where you label the predictions as 1 if the predicted probability is higher than 0.5). Store the accuracy of the model in acc.log.

  2. Build a model with boosting. Use a seed value of 2, the Bernoulli distribution and 1000 trees. Store the accuracy of the model in acc.boost.

  3. Build a model with bagging. Use a seed value of 3 and 1000 trees. Make sure your dependent variable is of the correct data type so that the randomForest() function selects the correct approach. Store the accuracy of the model in acc.bag.

  4. Build a random forest model with the default setting for mtry. Use a seed value of 4 and 1000 trees. Store the accuracy of the model in acc.rf.


Assume that: