First, we load the required packages and read in the data. We also set the encoding and take a look at the data.

if (!require("pacman")) install.packages("pacman") ; require("pacman")
p_load(SnowballC, slam, tm, RWeka, Matrix)

SentimentReal <- read_csv("SentimentReal.csv")
Encoding(SentimentReal$message) <- 'latin'
SentimentReal %>% glimpse()
Rows: 95
Columns: 2
$ message <chr> "Well your chief executive should go. I wont be donating money to your charity again", "really desperate now its 7 ye~
$ label   <dbl> -1, -1, 1, -1, 1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1~

The data includes comments, together with their label. This label is given by one rater. Using these labeled comments, we will build a model that can be extrapolated to the non-labeled comments. This is a supervised approach, compared to the unsupervised lexicon-based approach. We take a closer look at the first rows of the complete data set.

head(SentimentReal) 
# A tibble: 6 x 2
  message                                                                                                                         label
  <chr>                                                                                                                           <dbl>
1 "Well your chief executive should go. I wont be donating money to your charity again"                                              -1
2 "really desperate now its 7 years on and this as just come to public domain you are a disgusting vile  company hope this is th~    -1
3 "I believe"                                                                                                                         1
4 "So unbelievably sad that the enormous amount of good work carried out in desperately needy locations throughout the world by ~    -1
5 "Am from Kenya I would like to be a volunteer."                                                                                     1
6 "Feeling bit cross as a customer and supporter of many charities to discover OXFAM have been behaving is such poor way on the ~    -1

Exercise

Give a rating yourself to the following tweets about fruit. Assign a -1 to the tweet if it has a negative sentiment and assign a +1 if the tweet has a positive sentiment.

According to the WHO, a healthy lifestyle includes at least three pieces of fruit a day.
My favorite fruit are banana's.
The price of strawberries is way to high this year!

To download the SentimentReal dataset click here¹.