Next, we fit a naive Bayes model to the Smarket data. Naive Bayes is implemented in R using the naiveBayes() function, which is part of the e1071 library. The syntax is identical to that of lda() and qda(). By default, this implementation of the naive Bayes classifier models each quantitative feature using a Gaussian distribution. However, a kernel density method can also be used to estimate the distributions.

> library(e1071)
> nb.fit <- naiveBayes(Direction ~ Lag1 + Lag2, data = Smarket, subset = train)
> nb.fit

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace)

A-priori probabilities:
Y
    Down       Up 
0.491984 0.508016 

Conditional probabilities:
      Lag1
Y             [,1]     [,2]
  Down  0.04279022 1.227446
  Up   -0.03954635 1.231668

      Lag2
Y             [,1]     [,2]
  Down  0.03389409 1.239191
  Up   -0.03132544 1.220765

The output contains the estimated mean and standard deviation for each variable in each class. For example, the mean for Lag1 is 0.0428 for Direction=Down, and the standard deviation is 1.23. We can easily verify this:

> mean(Lag1[train][Direction[train] == "Down"])
[1] 0.0428
> sd(Lag1[train][Direction[train] == "Down"])
[1] 1.23

The predict() function is straightforward.

> nb.class <- predict(nb.fit, Smarket.2005)
> table(nb.class, Direction.2005)
        Direction.2005
nb.class Down  Up
    Down   28  20
    Up     83 121
> mean(nb.class == Direction.2005)
[1] 0.591

Naive Bayes performs very well on this data, with accurate predictions over 59% of the time. This is slightly worse than QDA, but much better than LDA.

Question

The predict() function can also generate estimates of the probability that each observation belongs to a particular class. Use the predict function on the nb.fit model with the Smarket.2005 data set and include the extra parameter type = "raw" to generate the estimates of probability for each class. Store the predictions in nb.preds.

Assume that:

The e1071 and ISLR2 libraries have been loaded
The Smarket dataset has been loaded and attached