We start by fitting the models in Section 10.6. We set up the data, and separate out a training and test set.

library(ISLR2)
Gitters <- na.omit(Hitters)
n <- nrow(Gitters)
set.seed(13)
ntest <- trunc(n / 3)
testid <- sample(1:n, ntest)

The linear model should be familiar, but we present it anyway.

lfit <- lm(Salary ~ ., data = Gitters[-testid,])
lpred <- predict(lfit, Gitters[testid,])
with(Gitters[testid,], mean(abs(lpred - Salary)))
[1] 254.6687

Notice the use of the with() command: the first argument is a dataframe, and the second an expression that can refer to elements of the dataframe by name. In this instance the dataframe corresponds to the test data and the expression computes the mean absolute prediction error on this data.

Next we fit the lasso using glmnet. Since this package does not use formulas, we create x and y first.

x <- scale(model.matrix(Salary ~ . - 1, data = Gitters))
y <- Gitters$Salary

The first line makes a call to model.matrix(), which produces the same matrix that was used by lm() (the -1 omits the intercept). This function automatically converts factors to dummy variables. The scale() function standardizes the matrix so each column has mean zero and variance one.

library(glmnet)
cvfit <- cv.glmnet(x[-testid,], y[-testid],
                   type.measure = "mae")
cpred <- predict(cvfit, x[testid,], s = "lambda.min")
mean(abs(y[testid] - cpred))
[1] 252.2994

To fit the neural network, we first set up a model structure that describes the network.

library(keras)
modnn <- keras_model_sequential() %>%
  layer_dense(units = 50, activation = "relu",
              input_shape = ncol(x)) %>%
  layer_dropout(rate = 0.4) %>%
  layer_dense(units = 1)

We have created a vanilla model object called modnn, and have added details about the successive layers in a sequential manner, using the function keras model sequential(). The pipe operator %>% passes the previous term as the first argument to the next function, and returns the result. It allows us to specify the layers of a neural network in a readable form.

We illustrate the use of the pipe operator on a simple example. Earlier, we created x using the command

x <- scale(model.matrix(Salary ~ . - 1, data = Gitters))

We first make a matrix, and then we center each of the variables. Compound expressions like this can be difficult to parse. We could have obtained the same result using the pipe operator:

x <- model.matrix(Salary ~ . - 1, data = Gitters) %>% scale()

Using the pipe operator makes it easier to follow the sequence of operations.

We now return to our neural network. The object modnn has a single hidden layer with 50 hidden units, and a ReLU activation function. It then has a dropout layer, in which a random 40% of the 50 activations from the previous layer are set to zero during each iteration of the stochastic gradient descent algorithm. Finally, the output layer has just one unit with no activation function, indicating that the model provides a single quantitative output.

Questions

Similarly as above, use the Hitters dataset with the same preprocessing steps.
Create a neural network modnn with a single hidden layer of 100 hidden units, and a tanh activation function.
Add a dropout layer, in which a random 30% of the 100 activations from the previous layer are set to zero during each iteration of the stochastic gradient descent algorithm
Add an output layer with just one unit and no activation function.