The e1071 library includes a built-in function, tune(), to perform cross-validation. By default, tune() performs ten-fold cross-validation on a set of models of interest. In order to use this function, we pass in relevant information about the set of models that are under consideration. The following command indicates that we want to compare SVMs with a linear kernel, using a range of values of the cost parameter.

set.seed(1)
tune.out <- tune(svm, y ~ ., data = dat, kernel = "linear", ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)))

We can easily access the cross-validation errors for each of these models using the summary() command:

summary(tune.out)

Parameter tuning of ‘svm’:
- sampling method: 10-fold cross validation 
- best parameters:
 cost
  0.1
- best performance: 0.05 
- Detailed performance results:
   cost error dispersion
1 1e-03  0.55  0.4377975
2 1e-02  0.55  0.4377975
3 1e-01  0.05  0.1581139
4 1e+00  0.15  0.2415229
5 5e+00  0.15  0.2415229
6 1e+01  0.15  0.2415229
7 1e+02  0.15  0.2415229

We see that cost=0.1 results in the lowest cross-validation error rate. The tune() function stores the best model obtained, which can be accessed as follows:

bestmod <- tune.out$best.model
summary(bestmod)

Questions

For the new data set below, perform 10-fold cross-validation to determine the best possible value of the cost parameter.
- Use a linear kernel
- scale the independent variables with the built-in parameter
- search these values of the cost parameter: 0.001, 0.01, 0.1, 1, 5, 10, 100.
- Store the outcome of the cross-validation in tune.out, the parameters of the best model and its performance in best.parameters and best.performance, respectively.

Hint: You can find the performance and parameters of the best model in the attributes of tune.out

The seed is set on top of the exercise, do not change the seed value or add another seed

Assume that:

The e1071 library has been loaded