In this exercise, we will predict the number of applications (Apps
) received using the other variables in the College
data set.
The dataset is available in the ISLR2
library.
Note: this exercise is primarily based on the lab of chapter 6 (Linear Model Selection and Regularization), which builds on the labs of chapters 3 and 5.
Split the data set into a training and a test set with a 60/40 distribution.
Set a seed value of 42 and use the sample()
function. Store the train and test dataframe in college.train
and college.test
, respectively.
Fit a linear regression model using least squares on the training set.
Use Apps
as dependent variable and all other variables as predictors.
Store the model in lm.fit
, the predictions in pred.lm
and the test error (MSE) in lm.error
.
Prepare the data to be used with a lasso or ridge regression model from the glmnet
package.
That is, convert the dataframes to matrices using the model.matrix()
function.
Note that this matrix only holds the independent variables.
Store the matrices in train.x
and test.x
. Also, store the dependent variables in train.y
and test.y
.
Fit a lasso regression model on the training set, with \(\lambda\) chosen by 5-fold cross-validation (the default argument is 10-fold CV!).
Use the following grid: grid <- 10 ^ seq(4, -2, length = 100)
.
Make sure the predictors are standardized and again set a seed value of 42.
Use the function cv.glmnet()
from the glmnet
package.
Store the lasso model in cv.lasso
and the optimal \(\lambda\) value in bestlam.lasso
.
Hint: in cv.glmnet()
, supply the training data (train.x
and train.y
), and specify the values for arguments alpha
, lambda
, nfolds
.
Hint: the optimal lambda value is an attribute of the cv.lasso
object.
Make predictions on the test set with the optimal \(\lambda\) value.
Store the predictions in pred.lasso
, the test MSE in lasso.error
and the coefficient estimates in coef.lasso
.
Hint: in predict()
for pred.lasso
, supply the CV object, and specify the values for arguments s
and newx
.
Hint: in predict()
for coef.lasso
, supply the CV object, and specify the values for arguments s
and type
.
Assume that:
ISLR2
library has been loadedglmnet
library has been loadedCollege
dataset has been loaded and attached.