First, we load the required packages and read in the data. We also set the encoding and take a look at the data.
if (!require("pacman")) install.packages("pacman") ; require("pacman")
p_load(SnowballC, slam, tm, RWeka, Matrix)
SentimentReal <- read_csv("SentimentReal.csv")
Encoding(SentimentReal$message) <- 'latin'
SentimentReal %>% glimpse()
Rows: 95
Columns: 2
$ message <chr> "Well your chief executive should go. I wont be donating money to your charity again", "really desperate now its 7 ye~
$ label <dbl> -1, -1, 1, -1, 1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1~
The data includes comments, together with their label. This label is given by one rater. Using these labeled comments, we will build a model that can be extrapolated to the non-labeled comments. This is a supervised approach, compared to the unsupervised lexicon-based approach. We take a closer look at the first rows of the complete data set.
head(SentimentReal)
# A tibble: 6 x 2
message label
<chr> <dbl>
1 "Well your chief executive should go. I wont be donating money to your charity again" -1
2 "really desperate now its 7 years on and this as just come to public domain you are a disgusting vile company hope this is th~ -1
3 "I believe" 1
4 "So unbelievably sad that the enormous amount of good work carried out in desperately needy locations throughout the world by ~ -1
5 "Am from Kenya I would like to be a volunteer." 1
6 "Feeling bit cross as a customer and supporter of many charities to discover OXFAM have been behaving is such poor way on the ~ -1
Give a rating yourself to the following tweets about fruit. Assign a -1 to the tweet if it has a negative sentiment and assign a +1 if the tweet has a positive sentiment.
To download the SentimentReal
dataset click
here1.