In this exercise, we will explore how to use the connections between different nodes in a social network to predict whether a customer will churn or not. This process is known as relational neighbor classification.

Setting Up

Before we begin, we need to load the sna package for social network analysis. If you haven’t installed it yet, the following code will do it for you:

if (!require(sna)){
  install.packages("sna",
                   repos="https://cran.rstudio.com/", 
                   quiet=TRUE)}
require(sna)

Relational Neighbor Classifier

We can convert the bank network into a network object using the network function:

net <- network(BankNetwork,matrix.type="edgelist", directed = FALSE)

The relational neighbor classifier uses the connections between nodes to predict a node’s attribute. In this case, we’re predicting whether a customer will churn or not.

First, we need to get the adjacency matrix of the network:

adjacency <- as.sociomatrix(net)

num [1:10, 1:10] 0 1 1 1 1 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:10] "A" "B" "C" "D" ...
  ..$ : chr [1:10] "A" "B" "C" "D" ...

Then, we calculate the number of churners and non-churners surrounding each node:

churn_neighbors <- numeric(length = ncol(adjacency))
nonchurn_neighbors <- numeric(length = ncol(adjacency))
for (i in 1:ncol(adjacency)) {
  neighbors <- which(adjacency[,i]==1)
  churn_neighbors[i] <- sum(V(g)$churn[neighbors], na.rm = TRUE)
  nonchurn_neighbors[i] <- sum(V(g)$churn[neighbors] == 0, na.rm = TRUE)
}

Next, we calculate the probability of churn for each node:

prob_churn <- churn_neighbors / (churn_neighbors + nonchurn_neighbors)
data.frame(rownames(adjacency),prob_churn)

And the probability of non-churn:

prob_nonchurn <- nonchurn_neighbors / (churn_neighbors + nonchurn_neighbors)

0.2500000 0.0000000 0.0000000 0.4000000 0.3333333 0.3333333 0.4000000 0.5000000 1.0000000 0.5000000

The sum of prob_churn and prob_nonchurn for each node should be equal to 1.

Exercise

Consider the PadelNetwork from the previous exercise. Calculate the probability of being a man (gender = 1) for all the nodes. Display the output in a nice dataframe with column names equal to nodes and prob_man. The dataframe itself should be stored in df.

To download the graph from the dataframe click: here¹

To download the PadelNetwork click: here²

Assume that:

The sna library has been loaded.
The igraph library has been loaded.
The PadelNetwork from the previous exercise has been loaded.
The graph from the dataframe g has been loaded.

Social Network Learning: Relational Neighbor Classifier

Setting Up

Relational Neighbor Classifier

Exercise