In this exercise, we will explore how to identify the most important features in a LASSO model. The importance of the features can be measured by the absolute value of the beta coefficients. This is because the LASSO shrinks the coefficients towards zero, making the absolute value of the coefficients a good indicator of feature importance.
First, we calculate the beta coefficients of the logistic regression model.
Beta_coeff <- abs(coef(LR, s = 0.005)) %>%
as.matrix() %>%
as_tibble(rownames = 'variables') %>%
rename_at(2, ~"importance")
Next, we identify the 10 most important variables or dimensions. We do this by arranging the coefficients in descending order and selecting the top 10.
top_variables <- Beta_coeff %>% top_n(n = 10)
ggplot(top_variables, aes(x = fct_reorder(variables, importance), y = importance)) +
geom_bar(stat = 'identity') +
coord_flip() +
labs(y = "Absolute value Beta coefficients", x = "Variables")
Try to obtain the beta coefficients of the logistic regression model and store them in beta_coefficients
. The s
parameter should be 0.005. Set the column names of the dataframe equal to features
and importance
. All other parameters rest unchanged.
To download the train basetable for the cats and dogs click: here1
To download the test basetable for the cats and dogs click: here2
Assume that: