In this problem we will investigate the t-statistic for the null hypothesis \(H_0 : \beta = 0\) in simple linear regression without an intercept.

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

To begin, we generate a predictor \(x\) and a response \(y\) as follows.

set.seed(1)
x <- rnorm(100)
y <- 2 * x + rnorm(100)
  1. Perform a simple linear regression with dependent variable \(y\) and independent variable \(x\), without an intercept. Store the model in lm.fit1. Report the coefficient estimate \(\hat{\beta}\), the standard error of this coefficient estimate, and the t-statistic and p-value associated with the null hypothesis \(H_0\) in the variables in beta.hat1, se1, t.stat1 and p.value1. (You can perform regression without an intercept using the command lm(y ~ 0 + x) and you can find the exact values of the beta coefficients, standard errors… with summary(lm.fit)$coefficients)

  2. Now perform a simple linear regression with dependent variable \(x\) and independent variable \(y\), without an intercept. Store the model in lm.fit2. Report the coefficient estimate \(\hat{\beta}\), the standard error of this coefficient estimate, and the t-statistic and p-value associated with the null hypothesis \(H_0\) in the variables in beta.hat2, se2, t.stat2 and p.value2.

  3. MC1: What is the relationship between the results obtained in 1. and 2.?
    1. Both results in 1. and 2. reflect the same line. Therefore, the coefficient estimates and standard error are equal. In other words, \(y = 2x + \varepsilon\) could also be written \(x = 0.5(y - \varepsilon)\).
    2. Both results in 1. and 2. reflect the same line. Therefore, the t-statistics and p-values are equal. In other words, \(y = 2x + \varepsilon\) could also be written \(x = 0.5(y - \varepsilon)\).
  4. For the regression of \(Y\) onto \(X\) without an intercept, the t-statistic for \(H_0 : \beta = 0\) takes the form \(\hat{\beta}/SE(\hat{\beta})\), where \(\hat{\beta}\) is given by \(\hat\beta=\left ( \sum_{i=1}^{n}x_i y_i \right )/\left ( \sum_{i=1}^{n}x_{i'}^2 \right )\) and where

    \[SE(\hat{\beta}) = \sqrt{\frac{\sum_{i=1}^n(y_i - x_i\hat{\beta})^2}{(n - 1)\sum_{i=1}^nx_i^2}}\]

    Show algebraically, and confirm numerically in R, that the t-statistic can be written as

    \[\frac{\sqrt{n - 1}\sum_{i=1}^nx_iy_i}{\sqrt{(\sum_{i=1}^nx_i^2)(\sum_{i=1}^ny_i^2) - (\sum_{i=1}^nx_iy_i)}}\]
  5. Using the results from 4., argue that the t-statistic for the regression of \(y\) onto \(x\) is the same t-statistic for the regression of \(x\) onto \(y\).

  6. In R, show that when regression is performed with an intercept, the t-statistic for \(H_0 : \beta_1 = 0\) is the same for the regression of \(y\) onto \(x\) as it is the regression of \(x\) onto \(y\). Store the model in lm.fit3 and lm.fit4, respectively. Store the t-statistics for \(H_0 : \beta_1 = 0\) in t.stat3 and t.stat4, respectively.