We now use boosting to predict Salary in the Hitters data set.
Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.
Remove the observations from the Hitters data.frame that have missing values using the is.na() function, and then log-transform the salaries.
Create a training set (Hitters.train) consisting of the first 200 observations, and a test set (Hitters.test) consisting of the remaining observations.
pows <- seq(-10, -0.2, by = 0.5)
lambdas <- 10^pows
Use “gaussian” for the distribution argument.
Set a seed value of 1.
Store the training MSE for each lambda in train.err and the test MSE in test.err (using a for-loop).
Produce a plot with different shrinkage values on the \(x\)-axis and the corresponding training set MSE on the \(y\)-axis.
Make the same plot for the test MSE.
Derive from the test MSE above, the best value for lambda. Store this value in lambda.boost.
lambda.boost for \(\lambda\).
Assume that:
ISLR2 and gbm libraries have been loaded.