We will use the glmnet
package in order to perform ridge regression and
the lasso. The main function in this package is glmnet()
, which can be used
to fit ridge regression models, lasso models, and more. This function has
slightly different syntax from other model-fitting functions that we have
encountered thus far in this book. In particular, we must pass in an x
matrix as well as a y
vector, and we do not use the y ~ x
syntax. We will
now perform ridge regression and the lasso in order to predict Salary on
the Hitters data. Before proceeding ensure that the missing values have
been removed from the data, as described in Section 6.5.
x <- model.matrix(Salary ~ ., Hitters)[, -1]
y <- Hitters$Salary
The model.matrix()
function is particularly useful for creating x
; not only
does it produce a matrix corresponding to the 19 predictors but it also
automatically transforms any qualitative variables into dummy variables.
The latter property is important because glmnet()
can only take numerical,
quantitative inputs.
The glmnet()
function has an alpha argument that determines what type
of model is fit. If alpha=0
then a ridge regression model is fit, and if alpha=1
then a lasso model is fit. We first fit a ridge regression model.
library(glmnet)
grid <- 10^seq(10, -2, length = 100)
ridge.mod <- glmnet(x, y, alpha = 0, lambda = grid)
By default the glmnet()
function performs ridge regression for an automatically
selected range of \(\lambda\) values. However, here we have chosen to implement
the function over a grid of values ranging from \(\lambda = 10^{10}\) to \(\lambda = 10^{-2}\), essentially
covering the full range of scenarios from the null model containing
only the intercept, to the least squares fit. As we will see, we can also compute
model fits for a particular value of \(\lambda\) that is not one of the original
grid
values. Note that by default, the glmnet()
function standardizes the
variables so that they are on the same scale. If we would want to turn off this default setting, we could
use the argument standardize=FALSE
.
With the Boston
dataset, create a model matrix x
with medv
as the response and all other variables as the predictor.
Create a response variable y
. Create a ridge regression using the predefined grid and store it in ridge.mod
.
Assume that:
MASS
and glmnet
libraries have been loadedBoston
dataset has been loaded and attached