In this exercise you will create some simulated data and will fit simple linear regression models to it.
Make sure to use set.seed(1)
prior to starting to ensure consistent results.
Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.
Using the rnorm()
function, create a vector x
, containing 100 observations drawn from a \(N(0,1)\) distribution.
This represents a feature, \(X\). Remember that the rnorm()
takes the argument \(\mu\) and \(\sigma\),
while the mathematical distribution is specified as \(N(\mu, \sigma^2)\)
Using the rnorm()
function, create a vector eps
,
containing 100 observations drawn from a \(N(0, 0.25)\) distribution.
Using x
and eps
, generate a vector y
according to the model
\(Y = -1 + 0.5X + \varepsilon\)
What is the length of the vector y
? Store the value in y.length
.
What are the values of \(\beta_0\) and \(\beta_1\) in this linear model? Store them in beta0
and beta1
.
Create a scatterplot displaying the relationship between x
and y
. Comment on what you observe.
Fit a least squares linear model to predict y
using x
. Store the model in lm.fit1
.
Comment on the model obtained.
Store \(\hat{\beta}_0\) and \(\hat{\beta}_1\) in beta.hat0
and beta.hat1
, respectively.
How do \(\hat{\beta}_0\) and \(\hat{\beta}_1\) compare to \(\beta_0\) and \(\beta_1\)?
Display the least squares line on the scatterplot obtained in 5.
Draw the population regression line on the plot, in a different color.
Use the legend()
function to create an appropriate legend.
The plot should look like this:
Now fit a polynomial regression model that predicts \(y\) using \(x\) and \(x^2\). Store the model in lm.fit2
.