Drop hier links of afbeeldingen om ze aan de editor toe te voegen.

Now the knn() function can be used to predict the market’s movement for the dates in 2005. We set a random seed before we apply knn() because if several observations are tied as nearest neighbors, then R will randomly break the tie. Therefore, a seed must be set in order to ensure reproducibility of results.

> set.seed(1)
> knn.pred <- knn(train.X, test.X, train.Direction, k = 1)
> 
> table(knn.pred, Direction.2005)
        Direction.2005
knn.pred Down Up
    Down   43 58
    Up     68 83
> (83 + 43) / 252
[1] 0.5

The results using K = 1 are not very good, since only 50% of the observations are correctly predicted. Of course, it may be that K = 1 results in an overly flexible fit to the data. Below, we repeat the analysis using K = 3.

> knn.pred <- knn(train.X, test.X, train.Direction, k = 3)
> table(knn.pred, Direction.2005)
        Direction.2005
knn.pred Down Up
    Down   48 54
    Up     63 87
> mean(knn.pred == Direction.2005)
[1] 0.5357143

The results have improved slightly. But increasing K further turns out to provide no further improvements. It appears that for this data, QDA provides the best results of the methods that we have examined so far.

Try creating a knn model with K = 10 and store the confusion matrix in knn.table and the accuracy in knn.acc:

Assume that:

The ISLR2 and class libraries have been loaded
The Smarket dataset has been loaded and attached
The seed has been set on 123 (do not overwrite the seeds yourself)