Now the knn() function can be used to predict the market’s movement for the dates in 2005. We set a random seed before we apply knn() because if several observations are tied as nearest neighbors, then R will randomly break the tie. Therefore, a seed must be set in order to ensure reproducibility of results.

> set.seed(1)
> knn.pred <- knn(train.X, test.X, train.Direction, k = 1)
> 
> table(knn.pred, Direction.2005)
        Direction.2005
knn.pred Down Up
    Down   43 58
    Up     68 83
> (83 + 43) / 252
[1] 0.5

The results using K = 1 are not very good, since only 50% of the observations are correctly predicted. Of course, it may be that K = 1 results in an overly flexible fit to the data. Below, we repeat the analysis using K = 3.

> knn.pred <- knn(train.X, test.X, train.Direction, k = 3)
> table(knn.pred, Direction.2005)
        Direction.2005
knn.pred Down Up
    Down   48 54
    Up     63 87
> mean(knn.pred == Direction.2005)
[1] 0.5357143

The results have improved slightly. But increasing K further turns out to provide no further improvements. It appears that for this data, QDA provides the best results of the methods that we have examined so far.

Try creating a knn model with K = 10 and store the confusion matrix in knn.table and the accuracy in knn.acc:

Assume that:

The ISLR2 and class libraries have been loaded
The Smarket dataset has been loaded and attached
The seed has been set on 123 (do not overwrite the seeds yourself)