The near-zero p-value associated with the quadratic term suggests that
it leads to an improved model. We use the anova()
function to further quantify
the extent to which the quadratic fit is superior to the linear fit.
> lm.fit <- lm(medv ~ lstat)
> anova(lm.fit, lm.fit2)
Analysis of Variance Table
Model 1: medv ~ lstat
Model 2: medv ~ lstat + I(lstat^2)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 504 19472
2 503 15347 1 4125.1 135.2 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Model 1 represents the linear submodel containing only one predictor,
lstat
, while Model 2 corresponds to the larger quadratic model that has two
predictors, lstat
and lstat²
. The anova()
function performs a hypothesis
test comparing the two models. The null hypothesis is that the two models
fit the data equally well, and the alternative hypothesis is that the full
model is superior. Here the F-statistic is 135 and the associated p-value is
virtually zero. This provides very clear evidence that the model containing
the predictors lstat
and lstat²
is far superior to the model that only
contains the predictor lstat
. This is not surprising, since earlier we saw
evidence for non-linearity in the relationship between medv
and lstat
.
Use the anova()
function to compare the second order and third order model (you created in the previous exercise) and store the output in Boston.anova:
MC1: The third order term of lstat
did significantly improve the model
Assume that:
MASS
library has been loadedBoston
dataset has been loaded and attachedlm.fit2
and lm.fit3
respectively