Interpreting Important Dimensions in LASSO

In this exercise, we will interpret the most important dimensions identified in the previous exercise. We will use the matrix V, which presents how Facebook page i scores on dimension j, to determine which Facebook pages, categories, and groups are most strongly related to the top dimensions.

Interpreting Pages, Categories, and Groups

First, we create data frames for pages, categories, and groups using the matrix V. The matrix V is derived from the Singular Value Decomposition (SVD) of the respective data.

interpretation_pages <- as.data.frame(as.matrix(svd_pages$v[, c(1:k)])) %>%
  mutate(page = top_pages)

Same for categories of Facebook pages.

interpretation_categories <- as.data.frame(as.matrix(svd_categories$v[, c(1:k)])) %>%
  mutate(category = top_categories)

Same for Facebook groups.

interpretation_groups <- as.data.frame(as.matrix(svd_groups$v[, c(1:k)])) %>%
  mutate(group = top_groups)

Identifying Top Pages, Categories, and Groups for Each Dimension

Next, we identify the pages, categories, and groups most related to each dimension. We sort them in descending order of the absolute value of their score on the respective dimension and select the top 10.
For example, to determine the pages most related to dimension 3:

(top_pages_dim3 <- interpretation_pages %>%
  arrange(desc(abs(V3))) %>%
  slice_head(n = 10) %>%
  select(V3, page) %>%
  as_tibble())
     V3      page                                                               
 1  -0.172   "Disney"                              
 2   0.169   "De Morgen"                           
 3   0.167   "Plan International Belgium"          
 4   0.164   "De Standaard"                        
 5   0.161   "Canvas"                              
 6   0.148   "Knack"                               
 7  -0.126   "Coca-Cola"                           
 8   0.122   "Artsen Zonder Grenzen (Belgi\\u00eb)"
 9   0.122   "Groen"                               
10   0.121   "VRT NWS"             

We can repeat this process for the other dimensions in the top 10 most important features.

(top_categories_dim5 <- interpretation_categories %>%
  arrange(desc(abs(V5))) %>%
  slice_head(n = 10) %>%
  select(V5, category) %>%
  as_tibble())
     V5      category                                                     
 1  -0.193   "Theater voor uitvoerende kunsten"
 2  -0.181   "Festival"                        
 3   0.178   "Voor de lol"                     
 4  -0.173   "Caf\\u00e9"                      
 5  -0.172   "Bar"                             
 6  -0.151   "Evenement"                       
 7   0.148   "Media"                           
 8  -0.146   "Jeugdorganisatie"                
 9  -0.142   "Restaurant"                      
10   0.140   "Tv-netwerk"    

Multiple choice

Type the number of the correct answer into the Dodona environment.

  1. The category 'danceclub' is probably related to dimension six.
  2. The category 'danceclub' is probably not related to dimension six.
    V6     page                          
 1  0.172  Queen             
 2  0.157  Stromae           
 3 -0.138  Tasty             
 4  0.137  Pink Floyd        
 5  0.137  U2                
 6 -0.123  IKEA              
 7  0.117  The Beatles       
 8  0.117  Adele             
 9  0.116  Belgian Red Devils
10  0.114  Sporza