You can also work with a transfer learning approach for the word embeddings by adding a pretrained embedding matrix. We will use the previous model of the English wikipedia. Note that the package requires the dimensions of word embeddings and docs to be equal<
# Use a pretrained embedding matrix
model <- read.word2vec(file = 'model.bin', normalize = TRUE)
# Build a PV-DM model with transfer learning
pvdm <- paragraph2vec(x = doc,
type = "PV-DM",
dim = 300,
iter = 20,
min_count = 5,
lr = 0.05,
threads = 1,
embeddings = as.matrix(model))
Get the document embeddings that can be used in a predictive model.
# Retrieve document embeddings for predictive modeling
embeddings <- as.matrix(pvdm, which = "docs")