This question uses the Caravan data set from the ISLR2 package. We first create a training set consisting of the first 1000 observations, and a test set consisting of the remaining observations.

set.seed(1)
train <- 1:1000
Caravan$Purchase <- ifelse(Caravan$Purchase == "Yes", 1, 0)
Caravan.train <- Caravan[train, ]
Caravan.test <- Caravan[-train, ]

Questions

  1. Fit a boosting model to the training set with Purchase as the response and the other variables as predictors. Use a seed of 1, 1000 trees, bernoulli distribution and a shrinkage value of 0.01.
    • MC1:
      Which predictors appear to be most important?
      • 1: PPERSAUT and MKOOPKLA.
      • 2: ABYSTAND and AINBOED.
      • 3: It is not possible to calculate variable importances for a boosting model.

  2. Use the boosting model to predict the response on the test data. Predict that a person will make a purchase if the estimated probability of purchase is greater than 20%. Form a confusion matrix. What fraction of the people predicted to make a purchase do in fact make one? Store this fraction in truepositive.

Assume that:

Copy the preprocessing code given above in your solution