The keras
package comes with a number of example datasets, including
the MNIST digit data. Our first step is to load the MNIST data. The
dataset mnist()
function is provided for this purpose.
mnist <- dataset_mnist()
x_train <- mnist$train$x
g_train <- mnist$train$y
x_test <- mnist$test$x
g_test <- mnist$test$y
dim(x_train)
[1] 60000 28 28
dim(x_test)
[1] 10000 28 28
There are 60,000 images in the training data and 10,000 in the test data. The images are \(28 \times 28\), and stored as a three-dimensional array, so we need to reshape them into a matrix. Also, we need to “one-hot” encode the class label. Luckily keras has a lot of built-in functions that do this for us.
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
y_train <- to_categorical(g_train, 10)
y_test <- to_categorical(g_test, 10)
Neural networks are somewhat sensitive to the scale of the inputs. For example, ridge and lasso regularization are affected by scaling. Here the inputs are eight-bit grayscale values between 0 and 255, so we rescale to the unit interval.
x_train <- x_train / 255
x_test <- x_test / 255
Now we are ready to fit our neural network.
modelnn <- keras_model_sequential()
modelnn %>%
layer_dense(units = 256, activation = "relu",
input_shape = c(784)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = "relu") %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 10, activation = "softmax")
The first layer goes from \(28 \times 28 = 784\) input units to a hidden layer of
256 units, which uses the ReLU activation function. This is specified by a
call to layer_dense()
, which takes as input a modelnn
object, and returns
a modified modelnn
object. This is then piped through layer dropout()
to
perform dropout regularization. The second hidden layer comes next, with
128 hidden units, followed by a dropout layer. The final layer is the output
layer, with activation "softmax"
for the 10-class classification
problem, which defines the map from the second hidden layer to class probabilities.
Finally, we use summary()
to summarize the model, and to make
sure we got it all right.
> summary(modelnn)
Model: "sequential_1"
_____________________________________________________________________
Layer (type) Output Shape Param #
=====================================================================
dense_4 (Dense) (None, 256) 200960
_____________________________________________________________________
dropout_2 (Dropout) (None, 256) 0
_____________________________________________________________________
dense_3 (Dense) (None, 128) 32896
_____________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_____________________________________________________________________
dense_2 (Dense) (None, 10) 1290
=====================================================================
Total params: 235,146
Trainable params: 235,146
Non-trainable params: 0
_____________________________________________________________________
The parameters for each layer include a bias term, which results in a parameter count of 235,146. For example, the first hidden layer involves \((784 + 1) \times 256 = 200,960\) parameters.
Notice that the layer names such as dropout_1
and dense_2
have subscripts.
These may appear somewhat random; in fact, if you fit the same
model again, these will change. They are of no consequence: they vary because
the model specification code is run in python, and these subscripts
are incremented every time keras_model_sequential()
is called.
modelnn
with three hidden layers of 64 and 32 hidden units, respectively. Add a ReLU activation function to each hidden layer.