In this exercise, we will generate simulated data, and will then use this data to perform best subset selection.
Use the rnorm()
function to generate a predictor x
of length eps
of length
Generate a response vector y
of length
Use the regsubsets()
function to perform best subset selection in order to choose the best model containing the predictors data.frame()
function to create a single data set containing both min_cp
, min_bic
and max_adjR2
.
Have a look at some plots in RStudio to understand the behaviour of the three metrics.
Store the coefficients of the best model according to the adjusted coef_bestmodel
.
Repeat question 3, using forward stepwise selection and also using backwards stepwise selection.
How does your answer compare to the results in question 3? Store the coefficients of the forward and backward models with the highest adjusted coef.fwd
and coef.bwd
.
Assume that:
leaps
library has been loaded