This problem involves the OJ data set which is part of the ISLR2 package.

  1. Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations. Store the randomly samples indices in train, the training set in OJ.train and test set in OJ.test. Make sure you set.seed(1) before sampling the indices.

  2. Fit a support vector classifier to the training data using cost = 10, with Purchase as the response and the other variables as predictors. Store the model in svm.lin. Store the total number of support vectors in nSV.lin.

  3. What are the training and test error rates? Store them in train.err.lin and test.err.lin, respectively.

  4. Use the tune() function to select an optimal cost. Consider values in the range (0.01, 0.1, 1). Store the output of tune() in tune.out.lin and the best parameters in best.param.lin. Make sure you set.seed(2).

  5. Compute the training and test error rates using this new value for cost, store them in train.err.lin.opt and test.err.lin.opt, respectively. Store the new model in svm.lin.opt.

  6. Repeat parts 2 through 5 using a support vector machine with a radial kernel. Initially for question 2, use the default value for gamma. Use the variable names with ‘lin’ replaced with ‘rad’. Use the same range for cost and the range (0.01, 0.1, 1) for gamma for question 4.

  7. Repeat parts 2 through 5 using a support vector machine with a polynomial kernel. Initially for question 2, use the degree = 2. Use the variable names with ‘lin’ replaced with ‘poly’. Use the same range for cost and the range (2, 3, 4) for degree for gamma for question 4.

  8. MC1: Overall, which approach seems to give the best results on this data?

    1. Linear SVM
    2. Polynomial SVM
    3. Radial SVM

Assume that: