if (!require("clickstream")) install.packages("clickstream", quiet=TRUE) ; require("clickstream")
We will build a Markov model from the following clickstream data, where we will try to predict the state of the next click based on the previous clicks. The different states in a clickstream context can be visiting a certain website (e.g., “P1”), or buy (e.g., “Buy”) or defer (e.g., “Defer”). The transitions can be to stay on the web page, to go to another webpage or just leave.
cls <- list(Session1 = c("P1", "P2", "P1", "P3", "P4", "Defer"),
Session2 = c("P3", "P4", "P1", "P3", "Defer"),
Session3 = c("P5", "P1", "P6", "P7", "P6", "P7", "P8", "P7", "Buy"),
Session4 = c("P9", "P2", "P11", "P12", "P11", "P13", "P11", "Buy"),
Session5 = c("P4", "P6", "P11", "P6", "P1", "P3", "Defer"),
Session6 = c("P3", "P13", "P12", "P4", "P12", "P1", "P4", "P1", "P3", "Defer"),
Session7 = c("P10", "P5", "P10", "P8", "P8", "P5", "P1", "P7", "Buy"),
Session8 = c("P9", "P2", "P1", "P9", "P3", "P1", "Defer"),
Session9 = c("P5", "P8", "P5", "P7", "P4", "P1", "P6", "P4", "Defer"))
class(cls) <- "Clickstreams"
In this example, we see that the user from session 1 starts on webpage 1. Next, (s)he visits web page 2 and so on. The session ends in a state where the user defers. Next, we build Markov chains. These are probabilistic models that represent dependencies between successive observations of a random variable. First, we create a Zero-Order Markov Chain that gives the probabilities that are equal to the relative counts in the data.
mc0 <- fitMarkovChain(clickstreamList = cls, order = 0, control = list(optimizer = "quadratic"))
mc0
Zero-Order Markov Chain
Probabilities:
states frequency probability
1 Buy 3 0.04285714
2 Defer 6 0.08571429
3 P1 11 0.15714286
4 P10 2 0.02857143
5 P11 4 0.05714286
6 P12 3 0.04285714
7 P13 2 0.02857143
8 P2 3 0.04285714
9 P3 7 0.10000000
10 P4 7 0.10000000
11 P5 5 0.07142857
12 P6 5 0.07142857
13 P7 5 0.07142857
14 P8 4 0.05714286
15 P9 3 0.04285714
Start Probabilities: P1 P10 P3 P4 P5 P9
0.1111111 0.1111111 0.2222222 0.1111111 0.2222222 0.2222222
End Probabilities: Buy Defer
0.3333333 0.6666667
Next, we create a First-Order Markov Chain that is based on the transition probabilities from one state to another.
mc1 <- fitMarkovChain(clickstreamList = cls, order = 1, control = list(optimizer = "quadratic"))
mc1
First-Order Markov Chain
Transition Probabilities:
Lag: 1
lambda: 1
Buy Defer P1 P10 P11 P12 P13 P2 P3 P4 P5 P6 P7 P8 P9
Buy 0 0 0.00000000 0.0 0.25 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.4 0.00 0.0000000
Defer 0 0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.4285714 0.2857143 0.0 0.0 0.0 0.00 0.0000000
P1 0 0 0.00000000 0.0 0.00 0.3333333 0.0 0.6666667 0.1428571 0.4285714 0.4 0.2 0.0 0.00 0.0000000
P10 0 0 0.00000000 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.0 0.0 0.00 0.0000000
P11 0 0 0.00000000 0.0 0.00 0.3333333 0.5 0.3333333 0.0000000 0.0000000 0.0 0.2 0.0 0.00 0.0000000
P12 0 0 0.00000000 0.0 0.25 0.0000000 0.5 0.0000000 0.0000000 0.1428571 0.0 0.0 0.0 0.00 0.0000000
P13 0 0 0.00000000 0.0 0.25 0.0000000 0.0 0.0000000 0.1428571 0.0000000 0.0 0.0 0.0 0.00 0.0000000
P2 0 0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.6666667
P3 0 0 0.36363636 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.3333333
P4 0 0 0.09090909 0.0 0.00 0.3333333 0.0 0.0000000 0.2857143 0.0000000 0.0 0.2 0.2 0.00 0.0000000
P5 0 0 0.00000000 0.5 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.50 0.0000000
P6 0 0 0.18181818 0.0 0.25 0.0000000 0.0 0.0000000 0.0000000 0.1428571 0.0 0.0 0.2 0.00 0.0000000
P7 0 0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.4 0.0 0.25 0.0000000
P8 0 0 0.00000000 0.5 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.0 0.2 0.25 0.0000000
P9 0 0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.0000000
Start Probabilities: P1 P10 P3 P4 P5 P9
0.1111111 0.1111111 0.2222222 0.1111111 0.2222222 0.2222222
End Probabilities: Buy Defer
0.3333333 0.6666667
From this First-Order Markov Chain, we learn that the probability of visiting web page 1 when you are on web page 12 is 33,33%. Lastly, we create a Second-Order Markov Chain. Note that for every lag there are transition probabilities and a lambda.
mc2 <- fitMarkovChain(clickstreamList = cls, order = 2, control = list(optimizer = "quadratic"))
mc2
Higher-Order Markov Chain (order=2)
Transition Probabilities:
Lag: 1
lambda: 0.2188383
Buy Defer P1 P10 P11 P12 P13 P2 P3 P4 P5 P6 P7 P8 P9
Buy 0 0 0.00000000 0.0 0.25 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.4 0.00 0.0000000
Defer 0 0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.4285714 0.2857143 0.0 0.0 0.0 0.00 0.0000000
P1 0 0 0.00000000 0.0 0.00 0.3333333 0.0 0.6666667 0.1428571 0.4285714 0.4 0.2 0.0 0.00 0.0000000
P10 0 0 0.00000000 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.0 0.0 0.00 0.0000000
P11 0 0 0.00000000 0.0 0.00 0.3333333 0.5 0.3333333 0.0000000 0.0000000 0.0 0.2 0.0 0.00 0.0000000
P12 0 0 0.00000000 0.0 0.25 0.0000000 0.5 0.0000000 0.0000000 0.1428571 0.0 0.0 0.0 0.00 0.0000000
P13 0 0 0.00000000 0.0 0.25 0.0000000 0.0 0.0000000 0.1428571 0.0000000 0.0 0.0 0.0 0.00 0.0000000
P2 0 0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.6666667
P3 0 0 0.36363636 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.3333333
P4 0 0 0.09090909 0.0 0.00 0.3333333 0.0 0.0000000 0.2857143 0.0000000 0.0 0.2 0.2 0.00 0.0000000
P5 0 0 0.00000000 0.5 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.50 0.0000000
P6 0 0 0.18181818 0.0 0.25 0.0000000 0.0 0.0000000 0.0000000 0.1428571 0.0 0.0 0.2 0.00 0.0000000
P7 0 0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.4 0.0 0.25 0.0000000
P8 0 0 0.00000000 0.5 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.0 0.2 0.25 0.0000000
P9 0 0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.0000000
Lag: 2
lambda: 0.7811617
Buy Defer P1 P10 P11 P12 P13 P2 P3 P4 P5 P6 P7 P8 P9
Buy 0 0 0.1 0.0 0.0000000 0.0000000 0.5 0.0000000 0.00 0.0 0.0 0.0 0.0000000 0.25 0.0000000
Defer 0 0 0.3 0.0 0.0000000 0.0000000 0.0 0.0000000 0.50 0.0 0.0 0.2 0.0000000 0.00 0.0000000
P1 0 0 0.2 0.0 0.3333333 0.0000000 0.0 0.0000000 0.25 0.2 0.0 0.0 0.3333333 0.25 0.6666667
P10 0 0 0.0 0.5 0.0000000 0.0000000 0.0 0.0000000 0.00 0.0 0.0 0.0 0.0000000 0.00 0.0000000
P11 0 0 0.0 0.0 0.6666667 0.0000000 0.0 0.0000000 0.00 0.2 0.0 0.0 0.0000000 0.00 0.3333333
P12 0 0 0.0 0.0 0.0000000 0.3333333 0.0 0.3333333 0.25 0.0 0.0 0.0 0.0000000 0.00 0.0000000
P13 0 0 0.0 0.0 0.0000000 0.3333333 0.0 0.0000000 0.00 0.0 0.0 0.0 0.0000000 0.00 0.0000000
P2 0 0 0.0 0.0 0.0000000 0.0000000 0.0 0.0000000 0.00 0.0 0.0 0.0 0.0000000 0.00 0.0000000
P3 0 0 0.1 0.0 0.0000000 0.0000000 0.0 0.3333333 0.00 0.4 0.0 0.2 0.0000000 0.00 0.0000000
P4 0 0 0.2 0.0 0.0000000 0.3333333 0.5 0.0000000 0.00 0.0 0.2 0.0 0.0000000 0.00 0.0000000
P5 0 0 0.0 0.0 0.0000000 0.0000000 0.0 0.0000000 0.00 0.0 0.2 0.0 0.0000000 0.25 0.0000000
P6 0 0 0.0 0.0 0.0000000 0.0000000 0.0 0.0000000 0.00 0.2 0.2 0.4 0.0000000 0.00 0.0000000
P7 0 0 0.1 0.0 0.0000000 0.0000000 0.0 0.0000000 0.00 0.0 0.2 0.0 0.6666667 0.25 0.0000000
P8 0 0 0.0 0.5 0.0000000 0.0000000 0.0 0.0000000 0.00 0.0 0.2 0.2 0.0000000 0.00 0.0000000
P9 0 0 0.0 0.0 0.0000000 0.0000000 0.0 0.3333333 0.00 0.0 0.0 0.0 0.0000000 0.00 0.0000000
Start Probabilities: P1 P10 P3 P4 P5 P9
0.1111111 0.1111111 0.2222222 0.1111111 0.2222222 0.2222222
End Probabilities: Buy Defer
0.3333333 0.6666667
Build the Third-Order Markov Chain and store it as mc3
. In addition,
extract the lambda of the third lag and store it as L3
.