This question relates to the College data set, available in the ISLR2 package.

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

Perform forward stepwise selection on the training set:
1. Split the data into a training set and a test set with the sample() function. Take 50% of the data (388 rows) in the training set and the other 50% in the test set. Use a seed value of 1. Store the indices of the training set in train. Use the indices to create College.train and College.test.
2. Using out-of-state tuition Outstate as the response and the other variables as the predictors, perform forward stepwise selection on the training set. Use the regsubsets() function from the leaps package. Set the nvmax parameter to the total number of predictors, 17. Store the object in subset.fit.
3. Call the summary() function on the object and inspect the attributes returning the adjusted \(R^2\), \(BIC\), and \(C_p\) for each of the subsets.
  - MC1:
    Plot the performance measures for a varying number of predictors. What is the minimum size for the subset that gives good performance?
    - 1: 4 predictors
    - 2: 5 predictors
    - 3: 6 predictors
    - 4: 7 predictors
    - 5: 8 predictors
    - 6: 9 predictors
    - 7: 10 predictors
4. Imagine that the minimum size of the subset is 4 predictors. Store the coefficients of this model in subset.coef. (Hint: you can call the coef() function with id parameter on the subset.fit object)
Fit a GAM on the training data, using out-of-state tuition Outstate as the response and the following features (in this order please):
1. Private
2. Smoothing spline of Room.Board with df=2
3. Smoothing spline of PhD with df=2
4. Smoothing spline of Expend with df=5
  Store the model in gam.fit. Set par(mfrow = c(2, 2)) and plot the results. Reflect on your findings.
Evaluate the model obtained on the test set.
1. Make predictions on the test set. Store the predictions in gam.preds.
2. Compute the test MSE. Store the result in mse.

Assume that:

The ISLR2 library has been loaded
The College dataset has been loaded and attached
The leaps library have been loaded
The gam library has been loaded