We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
Generate a data set with \(n = 500\) and \(p = 2\), such that the observations belong to two classes with a quadratic decision boundary between them.
set.seed(1)
x1 <- runif(500) - 0.5
x2 <- runif(500) - 0.5
y <- 1 * (x1^2 - x2^2 > 0)
data <- data.frame(x1 = x1, x2 = x2, y = y)
Plot the observations, colored according to their class labels. Your plot should display \(X_1\) on the \(x\)-axis and \(X_2\) on the \(y\)-axis.
Fit a logistic regression model to the data, using \(X_1\) and \(X_2\) as predictors.
Store the model in glm.fit1
.
Apply this model to training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels (Use a threshold of 0.5). (You can first plot all the observations with predicted class 0 with the plot() function, then plot all the observations with predicted class 1 with the points() function)
MC1: Is the decision boundary linear or non-linear
Now fit a logistic regression model to the data using non-linear functions of \(X_1\) and \(X_2\) as predictors.
Rebuild the model: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1^2 + \beta_4 x_2^2 + \beta_5 x_1 x_2\).
Store the model in glm.fit2
.
Apply this model to training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels.
MC2: Is the decision boundary linear or non-linear
Fit a support vector classifier with a linear kernel and cost=0.01
to the data with \(X_1\) and \(X_2\) as predictors.
Store the model in svm.fit1
.
Obtain a class prediction for each training observation and plot the observations, colored according to the predicted class labels.
MC3: Does the linear support vector classifier separate the classes well?
Fit a SVM using a radial basis kernel to the data with \(X_1\) and \(X_2\) as predictors, using gamma=1
.
Store the model in svm.fit2
.
Obtain a class prediction for each training observation, and plot the observations, colored according to the predicted class labels.
MC4: Does the radial basis SVM separate the classes well?
MC5: Which of these statements are false (only one answer).
Assume that:
e1071
library has been loaded