In this exercise, we will predict whether the number of applications (Apps) is high or low. For this approach, known as classification, we need a qualitative response variable.

Note: this exercise is based on the lab of chapter 4 (Classification).

plot

Questions



  1. Make the variable Apps binary. Assume that the number of applications is high when it is greater than or equal to 2000 (assign a value of 1) and low when it is less than 2000 (assign a value of 0). Store the newly created variable as the AppsBinary column in the dataset.

  2. Split the data set into a training and a test set with a 60/40 distribution. Set a seed value of 42 and use the sample() function. Store the train and test dataframe in college.train and college.test, respectively.

  3. Fit a logistic regression model on the training set. Use AppsBinary as the dependent variable and all other variables except Apps as predictors. Store the model in glm.fit and the predicted probabilities on the test set in glm.probs.
    Note: Do not forget to exclude the original Apps variable when fitting the model!

  4. Assign a value of 1 when the predicted probability is greater than 0.5 and a value of 0 when the predicted probability is less than or equal to 0.5. Store the predicted labels in glm.pred.

  5. Create a confusion matrix and store the matrix in glm.table. Furthermore, calculate the test accuracy and store this value in glm.acc.
    Note: give the predictions as the first argument in the table() function, and the ground truth as the second argument


Assume that: