In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

  1. Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median. Store the binary variable in the Auto dataframe with the name mpglevel. Make sure that mpglevel is encoded as a factor. After you have added mpglevel, drop the mpg attribute by Auto$mpg <- NULL.

  2. Fit a support vector classifier to the data with various values of cost (0.01, 0.1, 1) with the tune() function, in order to predict whether a car gets high or low gas mileage. Use all the other variables as predictors. Store the outcome of the cross-validation in tune.out.lin, the parameters of the best model and its performance in best.param.lin and best.perform.lin, respectively. Make sure to set.seed(1) before running the cross-validation.

  3. Now repeat 2, this time using SVMs with a radial basis kernel, with different values of gamma (0.01, 0.1, 1) and cost (0.01, 0.1, 1). Store the outcome of the cross-validation in tune.out.rad, the parameters of the best model and its performance in best.param.rad and best.perform.rad, respectively. Make sure to set.seed(1) before running the cross-validation.

  4. Now repeat 2, this time using SVMs with a polynomial basis kernel, with different values of degree (2, 3, 4) and cost (0.01, 0.1, 1). Store the outcome of the cross-validation in tune.out.poly, the parameters of the best model and its performance in best.param.poly and best.perform.poly, respectively. Make sure to set.seed(1) before running the cross-validation.

  5. MC1: Which one of the models performed best.

    1. linear SVM
    2. polynomial SVM
    3. radial SVM

Assume that: