In this exercise, we will predict whether the number of applications (Apps
) is high or low. For this approach, known as classification, we need a qualitative response variable.
Note: this exercise is based on the lab of chapter 4 (Classification).
Make the variable Apps
binary. Assume that the number of applications is high when it is greater than or equal to 2000 (assign a value of 1) and low when it is less than 2000 (assign a value of 0). Store the newly created variable as the AppsBinary
column in the dataset.
Split the data set into a training and a test set with a 60/40 distribution.
Set a seed value of 42 and use the sample()
function. Store the train and test dataframe in college.train
and college.test
, respectively.
Fit a logistic regression model on the training set. Use AppsBinary
as the dependent variable and all other variables except Apps
as predictors. Store the model in glm.fit
and the predicted probabilities on the test set in glm.probs
.
Note: Do not forget to exclude the original Apps variable when fitting the model!
Assign a value of 1 when the predicted probability is greater than 0.5 and a value of 0 when the predicted probability is less than or equal to 0.5. Store the predicted labels in glm.pred
.
Create a confusion matrix and store the matrix in glm.table
. Furthermore, calculate the test accuracy and store this value in glm.acc
.
Note: give the predictions as the first argument in the table()
function, and the ground truth as the second argument
Assume that:
ISLR2
library has been loadedCollege
dataset has been loaded and attached.