Now the knn()
function can be used to predict the market’s movement for
the dates in 2005. We set a random seed before we apply knn()
because
if several observations are tied as nearest neighbors, then R will randomly
break the tie. Therefore, a seed must be set in order to ensure reproducibility of results.
> set.seed(1)
> knn.pred <- knn(train.X, test.X, train.Direction, k = 1)
>
> table(knn.pred, Direction.2005)
Direction.2005
knn.pred Down Up
Down 43 58
Up 68 83
> (83 + 43) / 252
[1] 0.5
The results using K = 1 are not very good, since only 50% of the observations are correctly predicted. Of course, it may be that K = 1 results in an overly flexible fit to the data. Below, we repeat the analysis using K = 3.
> knn.pred <- knn(train.X, test.X, train.Direction, k = 3)
> table(knn.pred, Direction.2005)
Direction.2005
knn.pred Down Up
Down 48 54
Up 63 87
> mean(knn.pred == Direction.2005)
[1] 0.5357143
The results have improved slightly. But increasing K further turns out to provide no further improvements. It appears that for this data, QDA provides the best results of the methods that we have examined so far.
Try creating a knn model with K = 10 and store the confusion matrix in knn.table and the accuracy in knn.acc:
Assume that:
ISLR2
and class
libraries have been loadedSmarket
dataset has been loaded and attached