This problem involves the Boston data set from the MASS library, which we saw in the lab for this chapter. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors.

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. For each predictor, fit a simple linear regression model to predict the response (remember that chas is a categorical variable). In which of the models is there a statistically significant (p < 0.05) association between the predictor and the response? Delete the not signficant variables from the list significant.variables.single below. Create some plots to back up your assertions.

  2. Fit a multiple regression model to predict the response using all of the predictors. For which predictors can we reject the null hypothesis \(H_0 : \beta_j = 0\) (p < 0.05)? Delete the not signficant variables from the list significant.variables.multiple below.

  3. How do your results from 1 compare to your results from 2? Create a plot displaying the univariate regression coefficients from 1 on the x-axis, and the multiple regression coefficients from 2 on the y-axis. That is, each predictor is displayed as a single point on the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis.

  4. MC1: Which of these statements are false (only one answer)?
    1. It does make sense for the multiple regression to suggest no relationship between the response and some of the predictors while the simple linear regression implies the opposite because the correlation between the predictors show some strong relationships between some of the predictors.
    2. In the simple regression case, the slope term represents the average effect of an increase in the predictor, ignoring other predictors.
    3. In the multiple regression case, the slope term represents the average effect of an increase in the predictor, while holding other predictors fixed.
    4. In the simple regression case, the estimate for the beta coefficients will be more accurate since only one predictor is being measured.
  5. Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor \(X\), fit a model of the form \(Y = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + \varepsilon\) Delete the variables where a cubic coefficient is not significant from the list significant.variables.cubic below.