We start by fitting the models in Section 10.6. We set up the data, and separate out a training and test set.
library(ISLR2)
Gitters <- na.omit(Hitters)
n <- nrow(Gitters)
set.seed(13)
ntest <- trunc(n / 3)
testid <- sample(1:n, ntest)
The linear model should be familiar, but we present it anyway.
lfit <- lm(Salary ~ ., data = Gitters[-testid,])
lpred <- predict(lfit, Gitters[testid,])
with(Gitters[testid,], mean(abs(lpred - Salary)))
[1] 254.6687
Notice the use of the with()
command: the first argument is a dataframe,
and the second an expression that can refer to elements of the dataframe
by name. In this instance the dataframe corresponds to the test data and
the expression computes the mean absolute prediction error on this data.
Next we fit the lasso using glmnet. Since this package does not use formulas,
we create x
and y
first.
x <- scale(model.matrix(Salary ~ . - 1, data = Gitters))
y <- Gitters$Salary
The first line makes a call to model.matrix()
, which produces the same
matrix that was used by lm()
(the -1
omits the intercept). This function
automatically converts factors to dummy variables. The scale()
function
standardizes the matrix so each column has mean zero and variance one.
library(glmnet)
cvfit <- cv.glmnet(x[-testid,], y[-testid],
type.measure = "mae")
cpred <- predict(cvfit, x[testid,], s = "lambda.min")
mean(abs(y[testid] - cpred))
[1] 252.2994
To fit the neural network, we first set up a model structure that describes the network.
library(keras)
modnn <- keras_model_sequential() %>%
layer_dense(units = 50, activation = "relu",
input_shape = ncol(x)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 1)
We have created a vanilla model object called modnn
, and have added details
about the successive layers in a sequential manner, using the function
keras model sequential()
. The pipe operator %>%
passes the previous term
as the first argument to the next function, and returns the result. It allows
us to specify the layers of a neural network in a readable form.
We illustrate the use of the pipe operator on a simple example. Earlier,
we created x
using the command
x <- scale(model.matrix(Salary ~ . - 1, data = Gitters))
We first make a matrix, and then we center each of the variables. Compound expressions like this can be difficult to parse. We could have obtained the same result using the pipe operator:
x <- model.matrix(Salary ~ . - 1, data = Gitters) %>% scale()
Using the pipe operator makes it easier to follow the sequence of operations.
We now return to our neural network. The object modnn
has a single hidden
layer with 50 hidden units, and a ReLU activation function. It then has
a dropout layer, in which a random 40% of the 50 activations from the previous
layer are set to zero during each iteration of the stochastic gradient
descent algorithm. Finally, the output layer has just one unit with no activation
function, indicating that the model provides a single quantitative
output.
Hitters
dataset with the same preprocessing steps.modnn
with a single hidden layer of 100 hidden units, and a tanh activation function.