Loading the dictionary

In the following exercise we will perform sentiment analysis, using the sentimentr package. As an example, we introduce three short sentences that are stored as mytext. As always, we start by loading the required packages.

p_load(tidyverse,textclean, textstem, sentimentr, lexicon)

Example: mytext

mytext <- c(
  'Do you like analytics?  But I really hate programming.',
  'Google is my best friend.',
  'Do you really like data analytics?  I\'m a huge fan'
)

mytext %>% get_sentences() %>% sentiment()  
   element_id sentence_id word_count  sentiment
1:          1           1          4  0.2500000
2:          1           2          5 -1.0230011
3:          2           1          5  0.5813777
4:          3           1          6  0.3674235
5:          3           2          4  0.0000000

It outputs the sentiment per sentence, together with the word count. Sentiment is a score based on lexicon::hash_sentiment_jockers_rinker, ranging from -2 (very negative) to 1 (very positive). The first sentence of the example contains 4 words and has a sentiment of 0.25

To aggregate by row (doc), we use the sentiment_by function. You can also specify your own ‘by’ variable.

mytext %>% get_sentences() %>% sentiment_by()
   element_id word_count        sd ave_sentiment
1:          1          9 0.9001477    -0.3865005
2:          2          5        NA     0.5813777
3:          3         10 0.2598076     0.2004980

By default, the sentiment_by function downweights the zero for averaging. The reason is that you don’t want the neutral sentences to have a strong influence. Other options are the average_weighted_mixed_sentiment, that upweights the negatives and downweights the neutrals, and the average_mean functions.

mytext %>% get_sentences() %>% sentiment_by(averaging.function = average_weighted_mixed_sentiment)
   element_id word_count        sd ave_sentiment
1:          1          9 0.9001477    -1.9210022
2:          2          5        NA     0.5813777
3:          3         10 0.2598076     0.2004980

mytext %>% get_sentences() %>% sentiment_by(averaging.function = average_mean)
   element_id word_count        sd ave_sentiment
1:          1          9 0.9001477    -0.3865005
2:          2          5        NA     0.5813777
3:          3         10 0.2598076     0.1837117

Emoticons and word elongations

Let’s add some emoticons, word elongations, and exclamation marks to mytext.

mytext2 <- c(
  'Do you like analytics?  But I really hate programming :(.',
  'Google is my beeeeeeeest friend!',
  'Do you really like data analytics?  I\'m a huge fan.'
)

Notice that the emoticon is not detected and the word beeeeest is not seen as an intensifier.

mytext2 %>% get_sentences() %>% sentiment_by()
   element_id word_count        sd ave_sentiment
1:          1          9 0.9001477    -0.3865005
2:          2          5        NA     0.3577709
3:          3         10 0.2598076     0.2004980

Emoticons and word elongations have to be replaced first.

mytext2 %>% replace_emoticon()
[1] "Do you like analytics? But I really hate programming frown ." "Google is my beeeeeeeest friend!"                            
[3] "Do you really like data analytics? I'm a huge fan."          

mytext2 %>% replace_word_elongation()
[1] "Do you like analytics?  But I really hate programming :(." "Google is my best friend!"                                
[3] "Do you really like data analytics?  I'm a huge fan."  

We extract sentiment of the adapted text.

mytext2 %>% replace_emoticon() %>% replace_word_elongation() %>% get_sentences() %>% sentiment_by()
   element_id word_count        sd ave_sentiment
1:          1         10 1.2773506    -0.6532233
2:          2          5        NA     0.5813777
3:          3         10 0.2598076     0.2004980

Valence shifters

The exclamation mark is still not detected by the sentiment function. Therefore, we replace ‘!’ with the corresponding meaning ‘exclamation’ and add this to the valence shifter table. To do so, we Update the hash_valence_shifter with the new amplifier. The function valence_shifters_dt takes a 2 column data.frame (named x and y) with the first column being character and containing the words the second column being integer corresponding to:

negators,
amplifiers,
de-amplifiers,
adversative conjunctions

valence_shifters_updated <-update_valence_shifter_table(key = hash_valence_shifters,
                                                        x = data.frame(x = 'exclamation', y = 2))

Now let’s see whether this is added to our sentiment function.

mytext2 %>% 
  str_replace_all('!', ' exclamation') %>% 
  replace_emoticon() %>%
  replace_word_elongation() %>%
  get_sentences() %>% 
  sentiment_by(valence_shifters_dt = valence_shifters_updated)
   element_id word_count        sd ave_sentiment
1:          1         10 1.2773506    -0.6532233
2:          2          6        NA     0.9553010
3:          3         10 0.2598076     0.2004980

We see that the sentiment gets more positive with our updated lexicon.

mytext2 %>% str_replace_all('!', ' exclamation') %>% 
  replace_emoticon() %>% 
  replace_word_elongation() %>%
  get_sentences() %>% 
  sentiment_by()
   element_id word_count        sd ave_sentiment
1:          1         10 1.2773506    -0.6532233
2:          2          6        NA     0.5307228
3:          3         10 0.2598076     0.2004980

The exclamation mark is not detected in this case.

Exercise 1

Let's work with three tweets about fruit. Compute the sentiment per sentence, the average sentiment per row using the average_weighted_mixed_sentiment function, and the mean of the sentiment per row using the average_mean function and store it as sentiment_sentence, sentiment_row_avg, and sentiment_row_mean, respectively.

Exercise 2

Remove the emoticons and word elongations from the tweets and compute the sentiment per sentence. Store it as sentiment_sentence_cleaned.

Assume that:

The sentimentr and dplyr packages are loaded.