Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

Repeat question 4 using QDA. Perform QDA on the training data in order to predict mpg01 using the variables cylinders, weight, displacement and horsepower. What is the accuracy of the model obtained on the test data? Store the model in qda.fit, the predictions in qda.pred and the accuracy in qda.acc.
Repeat question 4 using logistic regression. Perform logistic regression on the training data in order to predict mpg01 using the variables cylinders, weight, displacement and horsepower. What is the accuracy of the model obtained on the test data? Store the model in glm.fit, the predictions in glm.pred and the accuracy in glm.acc. Use a threshold of 0.5 to classify predicted probabilities as 0 or 1 (numeric vector).
Repeat question 4 using naive bayes. Perform logistic regression on the training data in order to predict mpg01 using the variables cylinders, weight, displacement and horsepower. What is the accuracy of the model obtained on the test data? Store the model in nb.fit, the predictions in nb.pred and the accuracy in nb.acc.
Repeat question 4 using KNN. Perform KNN on the training data in order to predict mpg01 using the variables cylinders, weight, displacement and horsepower. What is the accuracy of the model obtained on the scaled test data? Store the predictions in knn.pred and the accuracy in knn.acc. Use \(K = 5\). Set a seed value of 1.

Recall that the KNN function knn() requires four inputs: train.X, test.X, train.mpg01, and the value for \(K\). However, this time first scale the independent variables with scale() before computing train.X and test.X. For example, a regular car has 4 cylinders and around 100 horsepower. As a result, horsepower would otherwise have a much larger effect on the distance between the observations because its scale is much larger.

Note: When using KNN (or any model that requires scaling), the test data must be scaled using the mean and standard deviation computed from the training data in order to avoid data leakage. There is a mistake in the current setup, so for now you should scale based on the mean and stdev of the TEST set. We’ll correct it for next year.

Assume that:

The ISLR2 , MASS, e1071, and class libraries have been loaded
The Auto dataset has been loaded and attached
The variable mpg01 is created and brought together with the Auto data set in data. The dataset has been split in train data.train and test data.test, and the test observations for the dependent variable mpg01 are in mpg01.test.