In this exercise, we will explore how to use the connections between different nodes in a social network to predict whether a customer will churn or not. This process is known as relational neighbor classification.
Before we begin, we need to load the sna
package for social network analysis.
If you haven’t installed it yet, the following code will do it for you:
if (!require(sna)){
install.packages("sna",
repos="https://cran.rstudio.com/",
quiet=TRUE)}
require(sna)
We can convert the bank network into a network object using the network function:
net <- network(BankNetwork,matrix.type="edgelist", directed = FALSE)
The relational neighbor classifier uses the connections between nodes to predict a node’s attribute. In this case, we’re predicting whether a customer will churn or not.
First, we need to get the adjacency matrix of the network:
adjacency <- as.sociomatrix(net)
num [1:10, 1:10] 0 1 1 1 1 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10] "A" "B" "C" "D" ...
..$ : chr [1:10] "A" "B" "C" "D" ...
Then, we calculate the number of churners and non-churners surrounding each node:
churn_neighbors <- numeric(length = ncol(adjacency))
nonchurn_neighbors <- numeric(length = ncol(adjacency))
for (i in 1:ncol(adjacency)) {
neighbors <- which(adjacency[,i]==1)
churn_neighbors[i] <- sum(V(g)$churn[neighbors], na.rm = TRUE)
nonchurn_neighbors[i] <- sum(V(g)$churn[neighbors] == 0, na.rm = TRUE)
}
Next, we calculate the probability of churn for each node:
prob_churn <- churn_neighbors / (churn_neighbors + nonchurn_neighbors)
data.frame(rownames(adjacency),prob_churn)
0.7500000
1.0000000
1.0000000
0.6000000
0.6666667
0.6666667
0.6000000
0.5000000
0.0000000
0.5000000
And the probability of non-churn:
prob_nonchurn <- nonchurn_neighbors / (churn_neighbors + nonchurn_neighbors)
0.2500000 0.0000000 0.0000000 0.4000000 0.3333333 0.3333333 0.4000000 0.5000000 1.0000000 0.5000000
The sum of prob_churn and prob_nonchurn for each node should be equal to 1.
Consider the PadelNetwork
from the previous exercise.
Calculate the probability of being a man (gender = 1) for all the nodes.
Display the output in a nice dataframe with column names equal to nodes
and prob_man
.
The dataframe itself should be stored in df
.
To download the graph from the dataframe click: here1
To download the PadelNetwork click: here2
Assume that:
PadelNetwork
from the previous exercise has been loaded.g
has been loaded.