Next, we fit a naive Bayes model to the Smarket data. Naive Bayes is implemented
in R using the naiveBayes() function, which is part of the e1071
library. The syntax is identical to that of lda() and qda(). By default, this
implementation of the naive Bayes classifier models each quantitative feature
using a Gaussian distribution. However, a kernel density method can
also be used to estimate the distributions.
> library(e1071)
> nb.fit <- naiveBayes(Direction ~ Lag1 + Lag2, data = Smarket, subset = train)
> nb.fit
Naive Bayes Classifier for Discrete Predictors
Call:
naiveBayes.default(x = X, y = Y, laplace = laplace)
A-priori probabilities:
Y
Down Up
0.491984 0.508016
Conditional probabilities:
Lag1
Y [,1] [,2]
Down 0.04279022 1.227446
Up -0.03954635 1.231668
Lag2
Y [,1] [,2]
Down 0.03389409 1.239191
Up -0.03132544 1.220765
The output contains the estimated mean and standard deviation for each
variable in each class. For example, the mean for Lag1 is 0.0428 for
Direction=Down, and the standard deviation is 1.23. We can easily verify
this:
> mean(Lag1[train][Direction[train] == "Down"])
[1] 0.0428
> sd(Lag1[train][Direction[train] == "Down"])
[1] 1.23
The predict() function is straightforward.
> nb.class <- predict(nb.fit, Smarket.2005)
> table(nb.class, Direction.2005)
Direction.2005
nb.class Down Up
Down 28 20
Up 83 121
> mean(nb.class == Direction.2005)
[1] 0.591
Naive Bayes performs very well on this data, with accurate predictions over 59% of the time. This is slightly worse than QDA, but much better than LDA.
predict() function can also generate estimates of the probability that each observation belongs to a particular class.
Use the predict function on the nb.fit model with the Smarket.2005 data set and include the extra parameter type = "raw" to generate the estimates of probability for each class.
Store the predictions in nb.preds.Assume that:
e1071 and ISLR2 libraries have been loadedSmarket dataset has been loaded and attached