When using the tm package, you don’t need to perform part-of-speech, or POS, tagging since this package has no functionalities that require POS tagging. However, there are other packages do to so: qdap and udpipe. For didactical reasons we will perform this on the non-spellchecked reviews.
The qdap package uses the Penn Treebank. We will use the pos
function:
p_load(qdap)
posdat <- pos(reviews)
posdat
wrd.cnt CC CD DT IN JJ JJR JJS MD NN NNS PRP PRP$ RB RBR RBS TO VB VBD VBG VBN VBP VBZ WRB
1 71 3(4.2%) 2(2.8%) 10(14.1%) 5(7.0%) 6(8.5%) 1(1.4%) 1(1.4%) 1(1.4%) 9(12.7%) 4(5.6%) 7(9.9%) 0 5(7.0%) 0 0 2(2.8%) 3(4.2%) 1(1.4%) 1(1.4%) 2(2.8%) 3(4.2%) 5(7.0%) 0
2 23 2(8.7%) 0 1(4.3%) 2(8.7%) 1(4.3%) 1(4.3%) 0 0 4(17.4%) 0 1(4.3%) 1(4.3%) 4(17.4%) 0 0 1(4.3%) 1(4.3%) 1(4.3%) 1(4.3%) 0 0 2(8.7%) 0
3 98 5(5.1%) 0 12(12.2%) 13(13.3%) 12(12.2%) 1(1.0%) 0 0 23(23.5%) 0 5(5.1%) 3(3.1%) 8(8.2%) 0 0 3(3.1%) 2(2.0%) 2(2.0%) 2(2.0%) 1(1.0%) 1(1.0%) 4(4.1%) 1(1.0%)
4 336 19(5.7%) 0 36(10.7%) 50(14.9%) 30(8.9%) 0 0 3(.9%) 58(17.3%) 11(3.3%) 17(5.1%) 8(2.4%) 32(9.5%) 1(.3%) 1(.3%) 12(3.6%) 14(4.2%) 5(1.5%) 9(2.7%) 5(1.5%) 9(2.7%) 13(3.9%) 3(.9%)
5 13 1(7.7%) 0 1(7.7%) 1(7.7%) 3(23.1%) 0 0 0 3(23.1%) 0 0 1(7.7%) 0 0 0 1(7.7%) 1(7.7%) 0 0 0 0 1(7.7%) 0
In the output above you can see how many words there are in each document, together with the distribution per word type. If you want to know what each tag stands for, you can use the following function.
pos_tags()
Tag Description
1 CC Coordinating conjunction 19 PRP$ Possessive pronoun
2 CD Cardinal number 20 RB Adverb
3 DT Determiner 21 RBR Adverb, comparative
4 EX Existential there 22 RBS Adverb, superlative
5 FW Foreign word 23 RP Particle
6 IN Preposition or subordinating conjunction 24 SYM Symbol
7 JJ Adjective 25 TO to
8 JJR Adjective, comparative 26 UH Interjection
9 JJS Adjective, superlative 27 VB Verb, base form
10 LS List item marker 28 VBD Verb, past tense
11 MD Modal 29 VBG Verb, gerund or present participle
12 NN Noun, singular or mass 30 VBN Verb, past participle
13 NNS Noun, plural 31 VBP Verb, non-3rd person singular present
14 NNP Proper noun, singular 32 VBZ Verb, 3rd person singular present
15 NNPS Proper noun, plural 33 WDT Wh-determiner
16 PDT Predeterminer 34 WP Wh-pronoun
17 POS Possessive ending 35 WP$ Possessive wh-pronoun
18 PRP Personal pronoun 36 WRB Wh-adverb
Let’s look at some popular methods.
preprocessed(posdat) JJ, NN, JJ, TO, VB, JJ, CC, VBZ, PRP$, NN, IN, DT, NN 13
POStagged
1 two/CD monthlong/JJ trips/NNS abroad/RB this/DT is/VBZ the/DT best/JJS it/PRP take/VB a/DT little/JJ while/NN to/TO get/VB used/VBN to/TO the/DT smaller/JJR keyboard/NN but/CC once/RB you/PRP do/VBP it/PRP works/VBZ flawlessly/RB the/DT charge/NN lasts/VBZ a/DT very/RB long/JJ time/NN months/NNS they/PRP say/VBP no/DT problem/NN not/RB recharging/VBG it/PRP for/IN weeks/NNS of/IN constant/JJ use/NN solid/JJ looks/VBZ good/JJ and/CC protects/VBZ the/DT ipad/NN i/NN couldnt/MD survive/VB without/IN it/PRP i/IN havent/PRP tried/VBD any/DT others/NNS but/CC i/NN am/VBP sold/VBN on/IN this/DT one/CD
2 this/DT is/VBZ nearly/RB as/RB heavy/JJ as/IN my/PRP$ laptop/NN and/CC i/NN was/VBD hoping/VBG to/TO find/VB something/NN lighter/JJR for/IN travel/NN but/CC it/PRP works/VBZ well/RB anyway/RB
3 wonderfully/RB thin/JJ light/NN and/CC durable/JJ the/DT keyboard/NN works/VBZ extremely/RB well/RB for/IN me/PRP my/PRP$ only/RB wish/VB about/IN this/DT is/VBZ that/IN the/DT angle/NN was/VBD not/RB quite/RB so/RB steep/JJ when/WRB open/JJ or/CC perhaps/RB adjustable/JJ if/IN i/PRP hold/VBP it/PRP on/IN my/PRP$ lap/NN with/IN the/DT front/JJ edge/NN of/IN the/DT keyboard/NN at/IN my/PRP$ navel/NN that/IN tilts/VBZ it/PRP to/TO an/DT acceptable/JJ angle/NN if/IN using/VBG it/PRP on/IN a/DT keyboard/NN tray/NN or/CC table/NN i/NN put/VBD a/DT postit/NN pad/NN or/CC similar/JJ item/NN under/IN the/DT front/JJ edge/NN to/TO put/VB the/DT view/NN angle/NN to/TO something/NN more/JJR sensible/NN but/CC overall/JJ this/DT is/VBZ much/JJ nicer/NN vs/IN the/DT keyboard/NN cover/NN id/JJ been/VBN using/VBG
4 this/DT keyboardcase/NN cover/NN is/VBZ absolutely/RB fabulous/JJ it/PRP works/VBZ so/RB well/RB and/CC it/PRP so/RB convenient/JJ and/CC stylish/JJ im/NN the/DT envy/NN of/IN all/DT of/IN my/PRP$ friends/NNS some/DT have/VBP even/RB mistaken/VBN the/DT combo/NN of/IN my/PRP$ ipad/NN and/CC this/DT keyboard/NN for/IN a/DT netbook/NN and/CC are/VBP amazed/VBN when/WRB im/PRP able/JJ to/TO so/RB easily/RB and/CC quickly/RB change/VB postions/NNS from/IN horizontal/JJ to/TO vertical/JJ and/CC then/RB snap/VB on/IN the/DT magnets/NNS to/TO coverclose/VB my/PRP$ ipad/NN and/CC on/IN the/DT go/NN in/IN seconds/NNS i/VBP also/RB love/VB how/WRB the/DT outer/JJ aluminum/NN casing/NN matching/VBG that/DT of/IN the/DT ipad/NN and/CC even/RB if/IN you/PRP add/VBP a/DT skin/NN like/IN i/NN did/VBD it/PRP still/RB looks/VBZ very/RB professional/JJ and/CC classysleek/JJ with/IN apples/NNS original/JJ design/NN im/IN a/DT college/NN student/NN and/CC this/DT very/RB versatile/JJ product/NN has/VBZ been/VBN really/RB amazingly/RB helpful/JJ easy/JJ to/TO type/NN on/IN and/CC efficient/JJ with/IN taking/VBG notes/NNS in/IN class/NN and/CC in/IN helping/VBG me/PRP to/TO use/VB my/PRP$ ipad/NN as/IN a/DT netbook/NN at/IN times/NNS and/CC also/RB to/TO detached/JJ and/CC use/NN is/VBZ solo/RB as/IN it/PRP was/VBD originally/RB intended/VBN very/RB versatile/JJ amazing/JJ product/NN i/NNS have/VBP highly/RB recommended/VBN this/DT to/TO others/NNS though/IN it/PRP does/VBZ most/RBS of/IN the/DT selling/VBG itself/PRP ps/VBZ my/PRP$ only/JJ con/NN is/VBZ that/IN the/DT ipad/NN isnt/NN as/IN secure/JJ in/IN this/DT keyboarddevice/NN in/IN the/DT vertical/JJ position/NN as/IN it/PRP is/VBZ horiztonally/RB it/PRP seems/VBZ that/IN it/PRP only/RB clicks/VBZ into/IN postion/NN with/IN a/DT locking/VBG secure/JJ feel/NN in/IN the/DT horizontal/JJ position/NN if/IN they/PRP were/VBD to/TO maybe/RB add/VB some/DT center/NN magnets/NNS to/TO the/DT keyboard/NN docking/NN areaslot/NN i/IN think/VBP it/PRP would/MD lock/VB into/IN place/NN and/CC function/NN more/RBR securely/RB since/IN being/VBG vertical/JJ makes/VBZ the/DT ipad/NN too/RB top/JJ heavy/NN for/IN the/DT keyboard/NN additionally/RB my/PRP$ keyboard/NN did/VBD actually/RB fall/VB outback/RB and/CC down/RB from/IN the/DT keyboard/NN docking/NN area/NN when/WRB in/IN vertical/JJ position/NN before/IN but/CC i/IN think/VBP it/PRP had/VBD to/TO do/VB with/IN the/DT pressing/VBG of/IN my/PRP$ fingers/NNS on/IN the/DT ipad/NN while/IN in/IN vertical/JJ position/NN not/RB a/DT problem/NN in/IN horizontal/JJ position/NN if/IN you/PRP are/VBP going/VBG to/TO touch/VB your/PRP$ screen/NN a/DT lot/NN and/CC type/NN i/NN would/MD suggest/VB the/DT horizontal/JJ position/NN but/CC if/IN you/PRP just/RB plan/VBP to/TO type/VB without/IN the/DT touching/VBG on/IN the/DT screen/NN i/VBZ think/VB the/DT vertical/JJ position/NN should/MD work/VB fine/JJ
5 great/JJ case/NN easy/JJ to/TO use/VB thin/JJ and/CC turns/VBZ my/PRP$ ipad/NN into/IN a/DT macbook/NN
POStags word.count
1 CD, JJ, NNS, RB, DT, VBZ, DT, JJS, PRP, VB, DT, JJ, NN, TO, VB, VBN, TO, DT, JJR, NN, CC, RB, PRP, VBP, PRP, VBZ, RB, DT, NN, VBZ, DT, RB, JJ, NN, NNS, PRP, VBP, DT, NN, RB, VBG, PRP, IN, NNS, IN, JJ, NN, JJ, VBZ, JJ, CC, VBZ, DT, NN, NN, MD, VB, IN, PRP, IN, PRP, VBD, DT, NNS, CC, NN, VBP, VBN, IN, DT, CD 71
2 DT, VBZ, RB, RB, JJ, IN, PRP$, NN, CC, NN, VBD, VBG, TO, VB, NN, JJR, IN, NN, CC, PRP, VBZ, RB, RB 23
3 RB, JJ, NN, CC, JJ, DT, NN, VBZ, RB, RB, IN, PRP, PRP$, RB, VB, IN, DT, VBZ, IN, DT, NN, VBD, RB, RB, RB, JJ, WRB, JJ, CC, RB, JJ, IN, PRP, VBP, PRP, IN, PRP$, NN, IN, DT, JJ, NN, IN, DT, NN, IN, PRP$, NN, IN, VBZ, PRP, TO, DT, JJ, NN, IN, VBG, PRP, IN, DT, NN, NN, CC, NN, NN, VBD, DT, NN, NN, CC, JJ, NN, IN, DT, JJ, NN, TO, VB, DT, NN, NN, TO, NN, JJR, NN, CC, JJ, DT, VBZ, JJ, NN, IN, DT, NN, NN, JJ, VBN, VBG 98
4 DT, NN, NN, VBZ, RB, JJ, PRP, VBZ, RB, RB, CC, PRP, RB, JJ, CC, JJ, NN, DT, NN, IN, DT, IN, PRP$, NNS, DT, VBP, RB, VBN, DT, NN, IN, PRP$, NN, CC, DT, NN, IN, DT, NN, CC, VBP, VBN, WRB, PRP, JJ, TO, RB, RB, CC, RB, VB, NNS, IN, JJ, TO, JJ, CC, RB, VB, IN, DT, NNS, TO, VB, PRP$, NN, CC, IN, DT, NN, IN, NNS, VBP, RB, VB, WRB, DT, JJ, NN, NN, VBG, DT, IN, DT, NN, CC, RB, IN, PRP, VBP, DT, NN, IN, NN, VBD, PRP, RB, VBZ, RB, JJ, CC, JJ, IN, NNS, JJ, NN, IN, DT, NN, NN, CC, DT, RB, JJ, NN, VBZ, VBN, RB, RB, JJ, JJ, TO, NN, IN, CC, JJ, IN, VBG, NNS, IN, NN, CC, IN, VBG, PRP, TO, VB, PRP$, NN, IN, DT, NN, IN, NNS, CC, RB, TO, JJ, CC, NN, VBZ, RB, IN, PRP, VBD, RB, VBN, RB, JJ, JJ, NN, NNS, VBP, RB, VBN, DT, TO, NNS, IN, PRP, VBZ, RBS, IN, DT, VBG, PRP, VBZ, PRP$, JJ, NN, VBZ, IN, DT, NN, NN, IN, JJ, IN, DT, NN, IN, DT, JJ, NN, IN, PRP, VBZ, RB, PRP, VBZ, IN, PRP, RB, VBZ, IN, NN, IN, DT, VBG, JJ, NN, IN, DT, JJ, NN, IN, PRP, VBD, TO, RB, VB, DT, NN, NNS, TO, DT, NN, NN, NN, IN, VBP, PRP, MD, VB, IN, NN, CC, NN, RBR, RB, IN, VBG, JJ, VBZ, DT, NN, RB, JJ, NN, IN, DT, NN, RB, PRP$, NN, VBD, RB, VB, RB, CC, RB, IN, DT, NN, NN, NN, WRB, IN, JJ, NN, IN, CC, IN, VBP, PRP, VBD, TO, VB, IN, DT, VBG, IN, PRP$, NNS, IN, DT, NN, IN, IN, JJ, NN, RB, DT, NN, IN, JJ, NN, IN, PRP, VBP, VBG, TO, VB, PRP$, NN, DT, NN, CC, NN, NN, MD, VB, DT, JJ, NN, CC, IN, PRP, RB, VBP, TO, VB, IN, DT, VBG, IN, DT, NN, VBZ, VB, DT, JJ, NN, MD, VB, JJ 336
5
counts(posdat)
wrd.cnt CC CD DT IN JJ JJR JJS MD NN NNS PRP PRP$ RB RBR RBS TO VB VBD VBG VBN VBP VBZ WRB
1 71 3 2 10 5 6 1 1 1 9 4 7 0 5 0 0 2 3 1 1 2 3 5 0
2 23 2 0 1 2 1 1 0 0 4 0 1 1 4 0 0 1 1 1 1 0 0 2 0
3 98 5 0 12 13 12 1 0 0 23 0 5 3 8 0 0 3 2 2 2 1 1 4 1
4 336 19 0 36 50 30 0 0 3 58 11 17 8 32 1 1 12 14 5 9 5 9 13 3
5 13 1 0 1 1 3 0 0 0 3 0 0 1 0 0 0 1 1 0 0 0 0 1 0
proportions(posdat)
wrd.cnt CC CD DT IN JJ JJR JJS MD NN NNS PRP PRP$ RB RBR RBS TO VB VBD VBG VBN VBP VBZ WRB
1 71 0.04225352 0.02816901 0.14084507 0.07042254 0.08450704 0.01408451 0.01408451 0.014084507 0.1267606 0.05633803 0.09859155 0.00000000 0.07042254 0.00000000 0.00000000 0.02816901 0.04225352 0.01408451 0.01408451 0.02816901 0.04225352 0.07042254 0.000000000
2 23 0.08695652 0.00000000 0.04347826 0.08695652 0.04347826 0.04347826 0.00000000 0.000000000 0.1739130 0.00000000 0.04347826 0.04347826 0.17391304 0.00000000 0.00000000 0.04347826 0.04347826 0.04347826 0.04347826 0.00000000 0.00000000 0.08695652 0.000000000
3 98 0.05102041 0.00000000 0.12244898 0.13265306 0.12244898 0.01020408 0.00000000 0.000000000 0.2346939 0.00000000 0.05102041 0.03061224 0.08163265 0.00000000 0.00000000 0.03061224 0.02040816 0.02040816 0.02040816 0.01020408 0.01020408 0.04081633 0.010204082
4 336 0.05654762 0.00000000 0.10714286 0.14880952 0.08928571 0.00000000 0.00000000 0.008928571 0.1726190 0.03273810 0.05059524 0.02380952 0.09523810 0.00297619 0.00297619 0.03571429 0.04166667 0.01488095 0.02678571 0.01488095 0.02678571 0.03869048 0.008928571
5 13 0.07692308 0.00000000 0.07692308 0.07692308 0.23076923 0.00000000 0.00000000 0.000000000 0.2307692 0.00000000 0.00000000 0.07692308 0.00000000 0.00000000 0.00000000 0.07692308 0.07692308 0.00000000 0.00000000 0.00000000 0.00000000 0.07692308 0.000000000
The combination of the latter two functions show the information that is given by the
pos
function. On top of this information, it is also possible to make plots.
plot(preprocessed(posdat))
The udpipe package contains the Universal dependencies database pos and the treebank pos. The advantage is that this works with multiple languages (including Dutch). This implies that you should first download the model for a specific language.
p_load(udpipe)
udmodel <- udpipe_download_model(language = "english")
Next, you can run the function to tokenize, POS tag, lemmatize and dependency tag the data.
parsed <- udpipe(reviews, object = udmodel)
head(parsed)
doc_id paragraph_id sentence_id
1 doc1 1 1
2 doc1 1 1
3 doc1 1 1
4 doc1 1 1
5 doc1 1 1
6 doc1 1 1
sentence
1 two monthlong trips abroad this is the best it take a little while to get used to the smaller keyboard but once you do it works flawlessly the charge lasts a very long time months they say no problem not recharging it for weeks of constant use solid looks good and protects the ipad i couldnt survive without it i havent tried any others but i am sold on this one
2 two monthlong trips abroad this is the best it take a little while to get used to the smaller keyboard but once you do it works flawlessly the charge lasts a very long time months they say no problem not recharging it for weeks of constant use solid looks good and protects the ipad i couldnt survive without it i havent tried any others but i am sold on this one
3 two monthlong trips abroad this is the best it take a little while to get used to the smaller keyboard but once you do it works flawlessly the charge lasts a very long time months they say no problem not recharging it for weeks of constant use solid looks good and protects the ipad i couldnt survive without it i havent tried any others but i am sold on this one
4 two monthlong trips abroad this is the best it take a little while to get used to the smaller keyboard but once you do it works flawlessly the charge lasts a very long time months they say no problem not recharging it for weeks of constant use solid looks good and protects the ipad i couldnt survive without it i havent tried any others but i am sold on this one
5 two monthlong trips abroad this is the best it take a little while to get used to the smaller keyboard but once you do it works flawlessly the charge lasts a very long time months they say no problem not recharging it for weeks of constant use solid looks good and protects the ipad i couldnt survive without it i havent tried any others but i am sold on this one
6 two monthlong trips abroad this is the best it take a little while to get used to the smaller keyboard but once you do it works flawlessly the charge lasts a very long time months they say no problem not recharging it for weeks of constant use solid looks good and protects the ipad i couldnt survive without it i havent tried any others but i am sold on this one
start end term_id token_id token lemma upos xpos feats head_token_id dep_rel deps misc
1 1 3 1 1 two two NUM CD NumType=Card 3 nummod <NA> <NA>
2 5 13 2 2 monthlong monthlong ADJ JJ Degree=Pos 3 amod <NA> <NA>
3 15 19 3 3 trips trip NOUN NNS Number=Plur 8 nsubj <NA> <NA>
4 21 26 4 4 abroad abroad ADV RB <NA> 5 advmod <NA> <NA>
5 28 31 5 5 this this PRON DT Number=Sing|PronType=Dem 8 nsubj <NA> <NA>
6 33 34 6 6 is be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 8 cop <NA> <NA>
Let’s have a look at the data for document 2. The pos-tags are upos (universal dependencies) and xpos (treebank).
head(parsed %>% filter(doc_id == 'doc2'))
doc_id paragraph_id sentence_id sentence
1 doc2 1 1 this is nearly as heavy as my laptop and i was hoping to find something lighter for travel but it works well anyway
2 doc2 1 1 this is nearly as heavy as my laptop and i was hoping to find something lighter for travel but it works well anyway
3 doc2 1 1 this is nearly as heavy as my laptop and i was hoping to find something lighter for travel but it works well anyway
4 doc2 1 1 this is nearly as heavy as my laptop and i was hoping to find something lighter for travel but it works well anyway
5 doc2 1 1 this is nearly as heavy as my laptop and i was hoping to find something lighter for travel but it works well anyway
6 doc2 1 1 this is nearly as heavy as my laptop and i was hoping to find something lighter for travel but it works well anyway
start end term_id token_id token lemma upos xpos feats head_token_id dep_rel deps misc
1 1 4 1 1 this this PRON DT Number=Sing|PronType=Dem 5 nsubj <NA> <NA>
2 6 7 2 2 is be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 5 cop <NA> <NA>
3 9 14 3 3 nearly nearly ADV RB <NA> 5 advmod <NA> <NA>
4 16 17 4 4 as as ADV RB <NA> 5 advmod <NA> <NA>
5 19 23 5 5 heavy heavy ADJ JJ Degree=Pos 0 root <NA> <NA>
6 25 26 6 6 as as ADP IN <NA> 8 case <NA> <NA>
textplot_dependencyparser(parsed %>% filter(doc_id == 'doc1'))
Which of the following statements is correct according to the output that is shown above?
reviews
contains 5 documents of which the fourth document
contains the highest number of words.To download the productreviews
dataset click
here1.