Drop hier links of afbeeldingen om ze aan de editor toe te voegen.

First, we load the required packages and read in the data. We also set the encoding and take a look at the data.

if (!require("pacman")) install.packages("pacman") ; require("pacman")
p_load(SnowballC, slam, tm, RWeka, Matrix)

SentimentReal <- read_csv("SentimentReal.csv")
Encoding(SentimentReal$message) <- 'latin'
SentimentReal %>% glimpse()
Rows: 95
Columns: 2
$ message <chr> "Well your chief executive should go. I wont be donating money to your charity again", "really desperate now its 7 ye~
$ label   <dbl> -1, -1, 1, -1, 1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1~

The data includes comments, together with their label. This label is given by one rater. Using these labeled comments, we will build a model that can be extrapolated to the non-labeled comments. This is a supervised approach, compared to the unsupervised lexicon-based approach. We take a closer look at the first rows of the complete data set.

head(SentimentReal) 
# A tibble: 6 x 2
  message                                                                                                                         label
  <chr>                                                                                                                           <dbl>
1 "Well your chief executive should go. I wont be donating money to your charity again"                                              -1
2 "really desperate now its 7 years on and this as just come to public domain you are a disgusting vile  company hope this is th~    -1
3 "I believe"                                                                                                                         1
4 "So unbelievably sad that the enormous amount of good work carried out in desperately needy locations throughout the world by ~    -1
5 "Am from Kenya I would like to be a volunteer."                                                                                     1
6 "Feeling bit cross as a customer and supporter of many charities to discover OXFAM have been behaving is such poor way on the ~    -1

Exercise

Give a rating yourself to the following tweets about fruit. Assign a -1 to the tweet if it has a negative sentiment and assign a +1 if the tweet has a positive sentiment.

  1. According to the WHO, a healthy lifestyle includes at least three pieces of fruit a day.
  2. My favorite fruit are banana's.
  3. The price of strawberries is way to high this year!

To download the SentimentReal dataset click here.