Next, we fit a naive Bayes model to the Smarket
data. Naive Bayes is implemented
in R using the naiveBayes()
function, which is part of the e1071
library. The syntax is identical to that of lda()
and qda()
. By default, this
implementation of the naive Bayes classifier models each quantitative feature
using a Gaussian distribution. However, a kernel density method can
also be used to estimate the distributions.
> library(e1071)
> nb.fit <- naiveBayes(Direction ~ Lag1 + Lag2, data = Smarket, subset = train)
> nb.fit
Naive Bayes Classifier for Discrete Predictors
Call:
naiveBayes.default(x = X, y = Y, laplace = laplace)
A-priori probabilities:
Y
Down Up
0.491984 0.508016
Conditional probabilities:
Lag1
Y [,1] [,2]
Down 0.04279022 1.227446
Up -0.03954635 1.231668
Lag2
Y [,1] [,2]
Down 0.03389409 1.239191
Up -0.03132544 1.220765
The output contains the estimated mean and standard deviation for each
variable in each class. For example, the mean for Lag1 is 0.0428 for
Direction=Down
, and the standard deviation is 1.23. We can easily verify
this:
> mean(Lag1[train][Direction[train] == "Down"])
[1] 0.0428
> sd(Lag1[train][Direction[train] == "Down"])
[1] 1.23
The predict()
function is straightforward.
> nb.class <- predict(nb.fit, Smarket.2005)
> table(nb.class, Direction.2005)
Direction.2005
nb.class Down Up
Down 28 20
Up 83 121
> mean(nb.class == Direction.2005)
[1] 0.591
Naive Bayes performs very well on this data, with accurate predictions over 59% of the time. This is slightly worse than QDA, but much better than LDA.
predict()
function can also generate estimates of the probability that each observation belongs to a particular class.
Use the predict function on the nb.fit
model with the Smarket.2005
data set and include the extra parameter type = "raw"
to generate the estimates of probability for each class.
Store the predictions in nb.preds
.Assume that:
e1071
and ISLR2
libraries have been loadedSmarket
dataset has been loaded and attached