This problem involves the OJ
data set which is part of the ISLR2
package.
Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
Store the randomly samples indices in train
, the training set in OJ.train
and test set in OJ.test
.
Make sure you set.seed(1)
before sampling the indices.
Fit a support vector classifier to the training data using cost = 10
,
with Purchase
as the response and the other variables as predictors.
Store the model in svm.lin
.
Store the total number of support vectors in nSV.lin
.
What are the training and test error rates? Store them in train.err.lin
and test.err.lin
, respectively.
Use the tune()
function to select an optimal cost
. Consider values in the range (0.01, 0.1, 1).
Store the output of tune()
in tune.out.lin
and the best parameters in best.param.lin
.
Make sure you set.seed(2)
.
Compute the training and test error rates using this new value for cost
,
store them in train.err.lin.opt
and test.err.lin.opt
, respectively.
Store the new model in svm.lin.opt
.
Repeat parts 2 through 5 using a support vector machine with a radial kernel.
Initially for question 2, use the default value for gamma
.
Use the variable names with ‘lin’ replaced with ‘rad’.
Use the same range for cost
and the range (0.01, 0.1, 1) for gamma
for question 4.
Repeat parts 2 through 5 using a support vector machine with a polynomial kernel.
Initially for question 2, use the degree = 2
.
Use the variable names with ‘lin’ replaced with ‘poly’.
Use the same range for cost
and the range (2, 3, 4) for degree
for gamma
for question 4.
MC1: Overall, which approach seems to give the best results on this data?
Assume that:
e1071
and ISLR2
libraries have been loadedOJ
data set has been loaded and attached