We now use boosting to predict Salary
in the Hitters
data set.
Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.
Remove the observations from the Hitters
data.frame that have missing values using the is.na()
function, and then log-transform the salaries.
Create a training set (Hitters.train
) consisting of the first 200 observations, and a test set (Hitters.test
) consisting of the remaining observations.
pows <- seq(-10, -0.2, by = 0.5)
lambdas <- 10^pows
Use “gaussian” for the distribution argument.
Set a seed value of 1.
Store the training MSE for each lambda in train.err
and the test MSE in test.err
(using a for-loop).
Produce a plot with different shrinkage values on the \(x\)-axis and the corresponding training set MSE on the \(y\)-axis.
Make the same plot for the test MSE.
Derive from the test MSE above, the best value for lambda. Store this value in lambda.boost
.
lambda.boost
for \(\lambda\).
Assume that:
ISLR2
and gbm
libraries have been loaded.