In this exercise, we will extend the relational neighbor classifier by incorporating a logistic regression model. This process is known as probabilistic relational neighbor classification.
The probabilistic relational neighbor classifier uses the connections between nodes and a logistic regression model to predict a node’s attribute. In this case, we’re predicting whether a customer will churn or not.
The churn probabilities resulting from the logistic regression are given below.
preds <- c(0.66, 0.85, 0.77, 0.89, 0.56, 0.61, 0.71, 0.42, 0.40, 0.30)
Then, the vertex labels can be updated.
V(g)$churn <- preds
The calculation of the number of churners for each node is equal to the calculation in the previous exercise.
churn_neighbors_upt <- numeric(length = ncol(adjacency))
nonchurn_neighbors_upt <- numeric(length = ncol(adjacency))
for (i in 1:ncol(adjacency)) {
neighbors <- which(adjacency[,i]==1)
churn_neighbors_upt[i] <- sum(V(g)$churn[neighbors], na.rm = TRUE)
nonchurn_neighbors_upt[i] <- sum(1-V(g)$churn[neighbors], na.rm = TRUE)
}
The probability of churn is then.
prob_churn_upt <- churn_neighbors_upt / (churn_neighbors_upt + nonchurn_neighbors_upt)
data.frame(Node = rownames(adjacency), PRN =prob_churn_upt)
0.7675000
0.5400000
0.7200000
0.5800000
0.6160000
0.8300000
0.4800000
0.6033333
0.4975000
0.6450000
Consider the PadelNetwork
and its probabilities from the previous exercise.
Calculate now the probabilistic probability of being a man (gender = 1) for all the nodes.
Display the output in a nice dataframe with column names equal to nodes
and prob_man
.
The dataframe itself should be stored in df
.
To download the graph from the dataframe click: here1
To download the PadelNetwork click: here2
Assume that:
PadelNetwork
from the previous exercise has been loaded.g
has been loaded.