We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

Generate a data set with \(n = 500\) and \(p = 2\), such that the observations belong to two classes with a quadratic decision boundary between them.

set.seed(1)
x1 <- runif(500) - 0.5
x2 <- runif(500) - 0.5
y <- 1 * (x1^2 - x2^2 > 0)
data <- data.frame(x1 = x1, x2 = x2, y = y)

Plot the observations, colored according to their class labels. Your plot should display \(X_1\) on the \(x\)-axis and \(X_2\) on the \(y\)-axis.
Fit a logistic regression model to the data, using \(X_1\) and \(X_2\) as predictors. Store the model in glm.fit1.
Apply this model to training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels (Use a threshold of 0.5). (You can first plot all the observations with predicted class 0 with the plot() function, then plot all the observations with predicted class 1 with the points() function)

MC1: Is the decision boundary linear or non-linear
1. linear
2. non-linear
Now fit a logistic regression model to the data using non-linear functions of \(X_1\) and \(X_2\) as predictors. Rebuild the model: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1^2 + \beta_4 x_2^2 + \beta_5 x_1 x_2\). Store the model in glm.fit2.
Apply this model to training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels.

MC2: Is the decision boundary linear or non-linear
1. linear
2. non-linear
Fit a support vector classifier with a linear kernel and cost=0.01 to the data with \(X_1\) and \(X_2\) as predictors. Store the model in svm.fit1.
Obtain a class prediction for each training observation and plot the observations, colored according to the predicted class labels.

MC3: Does the linear support vector classifier separate the classes well?
1. Yes, most of the points are well classified
2. No, most of the points are poorly classified
Fit a SVM using a radial basis kernel to the data with \(X_1\) and \(X_2\) as predictors, using gamma=1. Store the model in svm.fit2.
Obtain a class prediction for each training observation, and plot the observations, colored according to the predicted class labels.

MC4: Does the radial basis SVM separate the classes well?
1. Yes, most of the points are well classified
2. No, most of the points are poorly classified
MC5: Which of these statements are false (only one answer).
1. We may conclude that SVM with non-linear kernel and logistic regression with interaction terms are equally very powerful for finding non-linear decision boundaries.
2. SVM with linear kernel and logistic regression without any interaction term are very bad when it comes to finding non-linear decision boundaries
3. It is easier to find non-linear boundaries with logistic regression than with a SVM with a radial kernel.

Assume that:

The e1071 library has been loaded