Social Network Learning: Relational Neighbor Classifier

In this exercise, we will explore how to use the connections between different nodes in a social network to predict whether a customer will churn or not. This process is known as relational neighbor classification.

Setting Up

Before we begin, we need to load the sna package for social network analysis. If you haven’t installed it yet, the following code will do it for you:

if (!require(sna)){
  install.packages("sna",
                   repos="https://cran.rstudio.com/", 
                   quiet=TRUE)}
require(sna)

Relational Neighbor Classifier

We can convert the bank network into a network object using the network function:

net <- network(BankNetwork,matrix.type="edgelist", directed = FALSE)

The relational neighbor classifier uses the connections between nodes to predict a node’s attribute. In this case, we’re predicting whether a customer will churn or not.

First, we need to get the adjacency matrix of the network:

adjacency <- as.sociomatrix(net)
num [1:10, 1:10] 0 1 1 1 1 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:10] "A" "B" "C" "D" ...
  ..$ : chr [1:10] "A" "B" "C" "D" ...

Then, we calculate the number of churners and non-churners surrounding each node:

churn_neighbors <- numeric(length = ncol(adjacency))
nonchurn_neighbors <- numeric(length = ncol(adjacency))
for (i in 1:ncol(adjacency)) {
  neighbors <- which(adjacency[,i]==1)
  churn_neighbors[i] <- sum(V(g)$churn[neighbors], na.rm = TRUE)
  nonchurn_neighbors[i] <- sum(V(g)$churn[neighbors] == 0, na.rm = TRUE)
}

Next, we calculate the probability of churn for each node:

prob_churn <- churn_neighbors / (churn_neighbors + nonchurn_neighbors)
data.frame(rownames(adjacency),prob_churn)
0.7500000
1.0000000
1.0000000
0.6000000
0.6666667
0.6666667
0.6000000
0.5000000
0.0000000
0.5000000

And the probability of non-churn:

prob_nonchurn <- nonchurn_neighbors / (churn_neighbors + nonchurn_neighbors)
0.2500000 0.0000000 0.0000000 0.4000000 0.3333333 0.3333333 0.4000000 0.5000000 1.0000000 0.5000000

The sum of prob_churn and prob_nonchurn for each node should be equal to 1.

Exercise

Consider the PadelNetwork from the previous exercise. Calculate the probability of being a man (gender = 1) for all the nodes. Display the output in a nice dataframe with column names equal to nodes and prob_man. The dataframe itself should be stored in df.

To download the graph from the dataframe click: here1

To download the PadelNetwork click: here2


Assume that: