if (!require("clickstream")) install.packages("clickstream", quiet=TRUE) ; require("clickstream")

We will build a Markov model from the following clickstream data, where we will try to predict the state of the next click based on the previous clicks. The different states in a clickstream context can be visiting a certain website (e.g., “P1”), or buy (e.g., “Buy”) or defer (e.g., “Defer”). The transitions can be to stay on the web page, to go to another webpage or just leave.

cls <- list(Session1 = c("P1", "P2", "P1", "P3", "P4", "Defer"),
            Session2 = c("P3", "P4", "P1", "P3", "Defer"),
            Session3 = c("P5", "P1", "P6", "P7", "P6", "P7", "P8", "P7", "Buy"),
            Session4 = c("P9", "P2", "P11", "P12", "P11", "P13", "P11", "Buy"),
            Session5 = c("P4", "P6", "P11", "P6", "P1", "P3", "Defer"),
            Session6 = c("P3", "P13", "P12", "P4", "P12", "P1", "P4", "P1", "P3", "Defer"),
            Session7 = c("P10", "P5", "P10", "P8", "P8", "P5", "P1", "P7", "Buy"),
            Session8 = c("P9", "P2", "P1", "P9", "P3", "P1", "Defer"),
            Session9 = c("P5", "P8", "P5", "P7", "P4", "P1", "P6", "P4", "Defer"))
class(cls) <- "Clickstreams"

In this example, we see that the user from session 1 starts on webpage 1. Next, (s)he visits web page 2 and so on. The session ends in a state where the user defers. Next, we build Markov chains. These are probabilistic models that represent dependencies between successive observations of a random variable. First, we create a Zero-Order Markov Chain that gives the probabilities that are equal to the relative counts in the data.

mc0 <- fitMarkovChain(clickstreamList = cls, order = 0, control = list(optimizer = "quadratic"))
mc0
Zero-Order Markov Chain

Probabilities:

   states frequency probability
1     Buy         3  0.04285714
2   Defer         6  0.08571429
3      P1        11  0.15714286
4     P10         2  0.02857143
5     P11         4  0.05714286
6     P12         3  0.04285714
7     P13         2  0.02857143
8      P2         3  0.04285714
9      P3         7  0.10000000
10     P4         7  0.10000000
11     P5         5  0.07142857
12     P6         5  0.07142857
13     P7         5  0.07142857
14     P8         4  0.05714286
15     P9         3  0.04285714

Start Probabilities:         P1       P10        P3        P4        P5        P9 
                      0.1111111 0.1111111 0.2222222 0.1111111 0.2222222 0.2222222 

End Probabilities:          Buy     Defer 
                      0.3333333 0.6666667 

Next, we create a First-Order Markov Chain that is based on the transition probabilities from one state to another.

mc1 <- fitMarkovChain(clickstreamList = cls, order = 1, control = list(optimizer = "quadratic"))
mc1
First-Order Markov Chain

Transition Probabilities:

Lag:  1 
lambda:  1 
      Buy Defer         P1 P10  P11       P12 P13        P2        P3        P4  P5  P6  P7   P8        P9
Buy     0     0 0.00000000 0.0 0.25 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.4 0.00 0.0000000
Defer   0     0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.4285714 0.2857143 0.0 0.0 0.0 0.00 0.0000000
P1      0     0 0.00000000 0.0 0.00 0.3333333 0.0 0.6666667 0.1428571 0.4285714 0.4 0.2 0.0 0.00 0.0000000
P10     0     0 0.00000000 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.0 0.0 0.00 0.0000000
P11     0     0 0.00000000 0.0 0.00 0.3333333 0.5 0.3333333 0.0000000 0.0000000 0.0 0.2 0.0 0.00 0.0000000
P12     0     0 0.00000000 0.0 0.25 0.0000000 0.5 0.0000000 0.0000000 0.1428571 0.0 0.0 0.0 0.00 0.0000000
P13     0     0 0.00000000 0.0 0.25 0.0000000 0.0 0.0000000 0.1428571 0.0000000 0.0 0.0 0.0 0.00 0.0000000
P2      0     0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.6666667
P3      0     0 0.36363636 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.3333333
P4      0     0 0.09090909 0.0 0.00 0.3333333 0.0 0.0000000 0.2857143 0.0000000 0.0 0.2 0.2 0.00 0.0000000
P5      0     0 0.00000000 0.5 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.50 0.0000000
P6      0     0 0.18181818 0.0 0.25 0.0000000 0.0 0.0000000 0.0000000 0.1428571 0.0 0.0 0.2 0.00 0.0000000
P7      0     0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.4 0.0 0.25 0.0000000
P8      0     0 0.00000000 0.5 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.0 0.2 0.25 0.0000000
P9      0     0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.0000000

Start Probabilities:         P1       P10        P3        P4        P5        P9 
                      0.1111111 0.1111111 0.2222222 0.1111111 0.2222222 0.2222222 

End Probabilities:          Buy     Defer 
                      0.3333333 0.6666667 

From this First-Order Markov Chain, we learn that the probability of visiting web page 1 when you are on web page 12 is 33,33%. Lastly, we create a Second-Order Markov Chain. Note that for every lag there are transition probabilities and a lambda.

mc2 <- fitMarkovChain(clickstreamList = cls, order = 2, control = list(optimizer = "quadratic"))
mc2
Higher-Order Markov Chain (order=2)

Transition Probabilities:

Lag:  1 
lambda:  0.2188383 
      Buy Defer         P1 P10  P11       P12 P13        P2        P3        P4  P5  P6  P7   P8        P9
Buy     0     0 0.00000000 0.0 0.25 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.4 0.00 0.0000000
Defer   0     0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.4285714 0.2857143 0.0 0.0 0.0 0.00 0.0000000
P1      0     0 0.00000000 0.0 0.00 0.3333333 0.0 0.6666667 0.1428571 0.4285714 0.4 0.2 0.0 0.00 0.0000000
P10     0     0 0.00000000 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.0 0.0 0.00 0.0000000
P11     0     0 0.00000000 0.0 0.00 0.3333333 0.5 0.3333333 0.0000000 0.0000000 0.0 0.2 0.0 0.00 0.0000000
P12     0     0 0.00000000 0.0 0.25 0.0000000 0.5 0.0000000 0.0000000 0.1428571 0.0 0.0 0.0 0.00 0.0000000
P13     0     0 0.00000000 0.0 0.25 0.0000000 0.0 0.0000000 0.1428571 0.0000000 0.0 0.0 0.0 0.00 0.0000000
P2      0     0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.6666667
P3      0     0 0.36363636 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.3333333
P4      0     0 0.09090909 0.0 0.00 0.3333333 0.0 0.0000000 0.2857143 0.0000000 0.0 0.2 0.2 0.00 0.0000000
P5      0     0 0.00000000 0.5 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.50 0.0000000
P6      0     0 0.18181818 0.0 0.25 0.0000000 0.0 0.0000000 0.0000000 0.1428571 0.0 0.0 0.2 0.00 0.0000000
P7      0     0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.4 0.0 0.25 0.0000000
P8      0     0 0.00000000 0.5 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.2 0.0 0.2 0.25 0.0000000
P9      0     0 0.09090909 0.0 0.00 0.0000000 0.0 0.0000000 0.0000000 0.0000000 0.0 0.0 0.0 0.00 0.0000000

Lag:  2 
lambda:  0.7811617 
      Buy Defer  P1 P10       P11       P12 P13        P2   P3  P4  P5  P6        P7   P8        P9
Buy     0     0 0.1 0.0 0.0000000 0.0000000 0.5 0.0000000 0.00 0.0 0.0 0.0 0.0000000 0.25 0.0000000
Defer   0     0 0.3 0.0 0.0000000 0.0000000 0.0 0.0000000 0.50 0.0 0.0 0.2 0.0000000 0.00 0.0000000
P1      0     0 0.2 0.0 0.3333333 0.0000000 0.0 0.0000000 0.25 0.2 0.0 0.0 0.3333333 0.25 0.6666667
P10     0     0 0.0 0.5 0.0000000 0.0000000 0.0 0.0000000 0.00 0.0 0.0 0.0 0.0000000 0.00 0.0000000
P11     0     0 0.0 0.0 0.6666667 0.0000000 0.0 0.0000000 0.00 0.2 0.0 0.0 0.0000000 0.00 0.3333333
P12     0     0 0.0 0.0 0.0000000 0.3333333 0.0 0.3333333 0.25 0.0 0.0 0.0 0.0000000 0.00 0.0000000
P13     0     0 0.0 0.0 0.0000000 0.3333333 0.0 0.0000000 0.00 0.0 0.0 0.0 0.0000000 0.00 0.0000000
P2      0     0 0.0 0.0 0.0000000 0.0000000 0.0 0.0000000 0.00 0.0 0.0 0.0 0.0000000 0.00 0.0000000
P3      0     0 0.1 0.0 0.0000000 0.0000000 0.0 0.3333333 0.00 0.4 0.0 0.2 0.0000000 0.00 0.0000000
P4      0     0 0.2 0.0 0.0000000 0.3333333 0.5 0.0000000 0.00 0.0 0.2 0.0 0.0000000 0.00 0.0000000
P5      0     0 0.0 0.0 0.0000000 0.0000000 0.0 0.0000000 0.00 0.0 0.2 0.0 0.0000000 0.25 0.0000000
P6      0     0 0.0 0.0 0.0000000 0.0000000 0.0 0.0000000 0.00 0.2 0.2 0.4 0.0000000 0.00 0.0000000
P7      0     0 0.1 0.0 0.0000000 0.0000000 0.0 0.0000000 0.00 0.0 0.2 0.0 0.6666667 0.25 0.0000000
P8      0     0 0.0 0.5 0.0000000 0.0000000 0.0 0.0000000 0.00 0.0 0.2 0.2 0.0000000 0.00 0.0000000
P9      0     0 0.0 0.0 0.0000000 0.0000000 0.0 0.3333333 0.00 0.0 0.0 0.0 0.0000000 0.00 0.0000000

Start Probabilities:         P1       P10        P3        P4        P5        P9 
                      0.1111111 0.1111111 0.2222222 0.1111111 0.2222222 0.2222222 

End Probabilities:          Buy     Defer 
                      0.3333333 0.6666667 

Exercise

Build the Third-Order Markov Chain and store it as mc3. In addition, extract the lambda of the third lag and store it as L3.