This problem focuses on the collinearity problem.

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. Perform the following commands in R.
set.seed(1)
x1 <- runif(100)
x2 <- 0.5 * x1 + rnorm(100)/10
y <- 2 + 2 * x1 + 0.3 * x2 + rnorm(100)

The last line corresponds to creating a linear model in which y is a function of x1 and x2. Write out the form of the linear model. What are the regression coefficients?

  1. What is the correlation between x1 and x2? Store the correlation in x.cor. Create a scatterplot displaying the relationship between the variables.

  2. Using this data, fit a least squares regression to predict y using x1 and x2. Store the model in lm.fit1.
    • Describe the results obtained.
    • What are \(\hat{\beta}_0\), \(\hat{\beta}_1\) and \(\hat{\beta}_2\)? Store the answer in beta.hat0, beta.hat1, beta.hat2, respectively.
    • How do these relate to the true \(\beta_0\), \(\beta_1\) and \(\beta_2\)?
    • MC1: Can you reject the null hypothesis \(H_0 : \beta_1 = 0\) (p < 0.05)?
      1. Yes, we can reject the null hypothesis
      2. No, we cannot reject the null hypothesis
    • MC2: How about the null hypothesis \(H_0 : \beta_2 = 0\) (p < 0.05)?
      1. Yes, we can reject the null hypothesis
      2. No, we cannot reject the null hypothesis
  3. Now fit a least squares regression to predict y using only x1. Store the model in lm.fit2.
    • Comment on your results.
    • MC3: Can you reject the null hypothesis \(H_0 : \beta_1 = 0\)?
      1. Yes, we can reject the null hypothesis
      2. No, we cannot reject the null hypothesis
  4. Now fit a least squares regression to predict y using only x2. Store the model in lm.fit3.
    • Comment on your results.
    • MC4: Can you reject the null hypothesis \(H_0 : \beta_1 = 0\)?
      1. Yes, we can reject the null hypothesis
      2. No, we cannot reject the null hypothesis
  5. MC5: Do the results obtained in 3-5 contradict each other? Explain your answer.
    1. Yes, the results do contradict each other. As for the combined model, x2 is not significant and in the isolated model x2 is significant.
    2. Yes, the results do contradict each other. As for the combined model, x1 is not significant and in the isolated model x1 is significant.
    3. No, the results do not contradict each other. The significance of the independent variables do not change in the combined or isolated case.
    4. No, the results do not contradict each other. As the predictors x1 and x2 are highly correlated, the importance of the x2 variable has been masked due to the presence of collinearity.

Now suppose we obtain one additional observation, which was unfortunately mismeasured.

x1 <- c(x1, 0.1)
x2 <- c(x2, 0.8)
y <- c(y, 6)
  1. Re-fit the linear models from 3 to 5 using this new data. What effect does this new observation have on each of the models? In each model, is this observation an outlier? A high-leverage point? Reflect about this.