In this exercise, we will build a classification model using a logistic regression with L1 regularization (LASSO). This follows the feature engineering using Singular Value Decomposition (SVD) in the previous exercise.
We start by creating the training basetable.
This is done by column binding the usersTRAIN
dataframe and the different SVD dimensions.
BasetableTRAIN <- data.frame(usersTRAIN, svd_pagesTRAIN, svd_categoriesTRAIN, svd_groupsTRAIN)
BasetableTRAIN <- BasetableTRAIN %>%
rename_at(paste0("X", 1:50), ~paste0("pages_dim", 1:50)) %>%
rename_at(paste0("X", 1:50, ".1"), ~paste0("categories_dim", 1:50)) %>%
rename_at(paste0("X", 1:50, ".2"), ~paste0("groups_dim", 1:50))
We perform the same operation for the test basetable.
BasetableTEST <- data.frame(usersTEST, svd_pagesTEST, svd_categoriesTEST, svd_groupsTEST)
BasetableTEST <- BasetableTEST %>%
rename_at(paste0("X", 1:50), ~paste0("pages_dim", 1:50)) %>%
rename_at(paste0("X", 1:50, ".1"), ~paste0("categories_dim", 1:50)) %>%
rename_at(paste0("X", 1:50, ".2"), ~paste0("groups_dim", 1:50))
The dependent variable (donor) is excluded from the basetable and stored in a separate variable.
yTRAIN <- BasetableTRAIN$donor
BasetableTRAIN$donor <- NULL
yTEST <- BasetableTEST$donor
BasetableTEST$donor <- NULL
Now, we train the LASSO model on the training data.
LR <- glmnet(
x = data.matrix(BasetableTRAIN),
y = yTRAIN,
family = "binomial"
)
After the training, apply the model to the test data. Note that s is the regularization parameter alpha. Ideally, the value of alpha should be validated on a validation set, but that is out of the scope of this exercise.
predLRlasso <- predict(
LR,
newx = data.matrix(BasetableTEST),
type = "response",
s = 0.005
)