We continue to consider the use of a logistic regression model to predict the probability of default using income and balance on the Default data set. In particular, we will now compute estimates for the standard errors of the income and balance logistic regression coefficients in two different ways : (1) using the bootstrap, and (2) using the standard formula for computing the standard errors in the glm() function.

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. Fit a logistic regression model on the entire dataset. Use income and balance to predict the probability of default. Store the model in glm.fit. Store the estimated standard errors in the variable glm.se.

  2. Write a function, boot.fn(), that takes as input the Default data set as well as an index of the observations, and that outputs the coefficient estimates for income and balance in the multiple logistic regression model.

    1. The function takes 2 inputs: data and index.
    2. Inside the function, fit a logistic regression model on the Default data. You can reuse the code of question 1 but be sure to adapt the parameters so that the function inputs data and index are used.
    3. Return the intercept and slope estimates of the model. You can call the function coef() on the model. These coefficients will be used in question 3 to create bootstrap estimates of the standard errors.

    In order to test your function boot.fn(), do the following:

    1. Split the data into a training set and a validation set. Take 50% of the data (5000 rows) in the training set and the other 50% in the validation set. Use a seed value of 1. Store the indices of the training set in train.
    2. Call your function boot.fn() and add the dataset Default and the train indices train as parameters. Store the result in boot.test.
    3. Inspect boot.test. These are the coefficient estimates of the logistic regression model.


  3. Use the boot() function together with your boot.fn() function to estimate the standard errors of the logistic regression coefficients for income and balance. Don’t forget to load the library boot in your R session. Set a seed of 1 before running boot() and specify that we want 10 bootstrap samples. Store the result in boot.se. (Note: this command takes a few seconds to run)

  4. Inspect boot.se, the boostrap estimates for the standard errors. Compare it with the estimated standard errors glm.se obtained with the glm() function.

    • MC1:
      Are the estimated standard errors obtained by the two methods similar?
      • 1: Yes, estimated standard errors are pretty close
      • 2: No, estimated standard errors are not similar


Assume that: