Identifying Important Features with LASSO

In this exercise, we will explore how to identify the most important features in a LASSO model. The importance of the features can be measured by the absolute value of the beta coefficients. This is because the LASSO shrinks the coefficients towards zero, making the absolute value of the coefficients a good indicator of feature importance.

Calculating Beta Coefficients

First, we calculate the beta coefficients of the logistic regression model.

Beta_coeff <- abs(coef(LR, s = 0.005)) %>%
  as.matrix() %>%
  as_tibble(rownames = 'variables') %>%
  rename_at(2, ~"importance")

Identifying Top Variables

Next, we identify the 10 most important variables or dimensions. We do this by arranging the coefficients in descending order and selecting the top 10.

top_variables <- Beta_coeff %>% top_n(n = 10)

ggplot(top_variables, aes(x = fct_reorder(variables, importance), y = importance)) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  labs(y = "Absolute value Beta coefficients", x = "Variables")

plot

Exercise

Try to obtain the beta coefficients of the logistic regression model and store them in beta_coefficients. The s parameter should be 0.005. Set the column names of the dataframe equal to features and importance. All other parameters rest unchanged.

To download the train basetable for the cats and dogs click: here1

To download the test basetable for the cats and dogs click: here2


Assume that: