Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. Repeat question 4 using QDA. Perform QDA on the training data in order to predict mpg01 using the variables cylinders, weight, displacement and horsepower. What is the accuracy of the model obtained on the test data? Store the model in qda.fit, the predictions in qda.pred and the accuracy in qda.acc.

  2. Repeat question 4 using logistic regression. Perform logistic regression on the training data in order to predict mpg01 using the variables cylinders, weight, displacement and horsepower. What is the accuracy of the model obtained on the test data? Store the model in glm.fit, the predictions in glm.pred and the accuracy in glm.acc. Use a threshold of 0.5 to classify predicted probabilities as 0 or 1 (numeric vector).

  3. Repeat question 4 using naive bayes. Perform logistic regression on the training data in order to predict mpg01 using the variables cylinders, weight, displacement and horsepower. What is the accuracy of the model obtained on the test data? Store the model in nb.fit, the predictions in nb.pred and the accuracy in nb.acc.

  4. Repeat question 4 using KNN. Perform KNN on the training data in order to predict mpg01 using the variables cylinders, weight, displacement and horsepower. What is the accuracy of the model obtained on the scaled test data? Store the predictions in knn.pred and the accuracy in knn.acc. Use \(K = 5\). Set a seed value of 1.

    Recall that the KNN function knn() requires four inputs: train.X, test.X, train.mpg01, and the value for \(K\). However, this time first scale the independent variables with scale() before computing train.X and test.X. For example, a regular car has 4 cylinders and around 100 horsepower. As a result, horsepower would otherwise have a much larger effect on the distance between the observations because its scale is much larger.




Assume that: