LASSO for Donation Behavior Analysis

In this exercise, we will build a classification model using a logistic regression with L1 regularization (LASSO). This follows the feature engineering using Singular Value Decomposition (SVD) in the previous exercise.

Preparing the Training and Testing Datasets

We start by creating the training basetable. This is done by column binding the usersTRAIN dataframe and the different SVD dimensions.

BasetableTRAIN <- data.frame(usersTRAIN, svd_pagesTRAIN, svd_categoriesTRAIN, svd_groupsTRAIN)
BasetableTRAIN <- BasetableTRAIN %>%
  rename_at(paste0("X", 1:50), ~paste0("pages_dim", 1:50)) %>%
  rename_at(paste0("X", 1:50, ".1"), ~paste0("categories_dim", 1:50)) %>%
  rename_at(paste0("X", 1:50, ".2"), ~paste0("groups_dim", 1:50))

We perform the same operation for the test basetable.

BasetableTEST <- data.frame(usersTEST, svd_pagesTEST, svd_categoriesTEST, svd_groupsTEST)
BasetableTEST <- BasetableTEST %>%
  rename_at(paste0("X", 1:50), ~paste0("pages_dim", 1:50)) %>%
  rename_at(paste0("X", 1:50, ".1"), ~paste0("categories_dim", 1:50)) %>%
  rename_at(paste0("X", 1:50, ".2"), ~paste0("groups_dim", 1:50))

The dependent variable (donor) is excluded from the basetable and stored in a separate variable.

yTRAIN <- BasetableTRAIN$donor
BasetableTRAIN$donor <- NULL
yTEST <- BasetableTEST$donor
BasetableTEST$donor <- NULL

Training the LASSO

Now, we train the LASSO model on the training data.

LR <- glmnet(
  x = data.matrix(BasetableTRAIN),
  y = yTRAIN,
  family = "binomial"
)

After the training, apply the model to the test data. Note that s is the regularization parameter alpha. Ideally, the value of alpha should be validated on a validation set, but that is out of the scope of this exercise.

predLRlasso <- predict(
  LR,
  newx = data.matrix(BasetableTEST),
  type = "response",
  s = 0.005
)