STEP 5. Modelling using Markov chains

First, we make a copy of our data for a subsequent analysis. Next, we assign the class Clickstreams to our list of clickstreams.

clstr2 <- clstr
class(clstr) <- "Clickstreams"

We make a zero-order Markov chain, which is the current distribution of our data, and a first-order Markov chain , which is the transition matrix.

mc <- fitMarkovChain(clickstreamList = clstr,
                     order = 0,
                     control = list(optimizer = "quadratic"))

mc1 <- fitMarkovChain(clickstreamList = clstr2,
                      order = 1,
                      control = list(optimizer = "quadratic"))
ErrorinfitMarkovChain(clickstreamList = clstr2, order = 1, control = list(optimizer = "quadratic")):
The order is too high for the specified clickstreams.

The latter does not work because for some we cannot determine the first order MC. Therefore, we only take the ones that have visited more than one page (can be for instance multiple times the MMA page).

clstr2 <- clstr2[map_int(clstr2, length) > 1]
class(clstr2) <- "Clickstreams"
mc1 <- fitMarkovChain(clickstreamList = clstr2,
                      order = 1,
                      control = list(optimizer = "quadratic"))

We make a Markov chain of order 2 = two transition matrices for each lag.

clstr2 <- clstr2[map_int(clstr2, length) > 2]
mc2 <- fitMarkovChain(clickstreamList = clstr2,
                      order = 2,
                      control = list(optimizer = "quadratic"))

Analyze the results some more

plot(mc2, order = 2)

plotMC

View(t(mc2@transitions[[1]]))

	analytics/CI_start.htm	analytics/Graduates.htm	analytics/IT_backbone.htm	analytics/IT_frontend.htm	analytics/Keybenefits.htm	analytics/Projects.htm
analytics/CI_start.htm	0.00000000	0.1964285714	0.00000000	0.00000000	0.00000000	0.0000000000
analytics/Graduates.htm	0.00000000	0.0000000000	0.01754386	0.00000000	0.26315789	0.0175438596
analytics/IT_backbone.htm	0.00000000	0.0204081633	0.00000000	0.57142857	0.00000000	0.0000000000
analytics/IT_frontend.htm	0.00000000	0.0000000000	0.18367347	0.00000000	0.00000000	0.0000000000
analytics/Keybenefits.htm	0.01886792	0.0377358491	0.00000000	0.00000000	0.00000000	0.5283018868

Exercise

Create a zero-order, first-order, and second-order Markov chain for the logs data and store it as mc, mc1, and mc2, respectively. Note that you need to filter the clstr variable, that you created in the previous exercise, on users that have visited more than one (two) page(s) to create mc1 (mc2).

Note: Don’t forget to define the Clickstreams class.

To download the all_logs_ugent dataset click here¹.

To download the logs dataset click here².

Assume that:

The clstr variable that was calculated in the previous exercise is given.