The plots showed a rather linear function for year. We can perform a
series of ANOVA tests in order to determine which of these three models is
best: a GAM that excludes year
(\(\mathcal{M}_1\)), a GAM that uses a linear function
of year
(\(\mathcal{M}_2\)), or a GAM that uses a spline function of year
(\(\mathcal{M}_3\)) (see previous exercise for \(gam.m3\)).
gam.m1 <- gam(wage ~ s(age, 5) + education, data = Wage)
gam.m2 <- gam(wage ~ year + s(age, 5) + education, data = Wage)
anova(gam.m1, gam.m2, gam.m3, test = "F")
Analysis of Deviance Table
Model 1: wage ~ s(age, 5) + education
Model 2: wage ~ year + s(age, 5) + education
Model 3: wage ~ s(year, 4) + s(age, 5) + education
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 2990 3711731
2 2989 3693842 1 17889.2 14.4771 0.0001447 ***
3 2986 3689770 3 4071.1 1.0982 0.3485661
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We find that there is compelling evidence that a GAM with a linear function
of year
is better than a GAM that does not include year
at all
(p-value=0.00014). However, there is no evidence that a non-linear function
of year
is needed (p-value=0.349). In other words, based on the results
of this ANOVA, \(\mathcal{M}_2\) is preferred.
The summary()
function produces a summary of the gam fit.
summary(gam.m3)
Call: gam(formula = wage ~ s(year, 4) + s(age, 5) + education, data = Wage)
Deviance Residuals:
Min 1Q Median 3Q Max
-119.43 -19.70 -3.33 14.17 213.48
(Dispersion Parameter for gaussian family taken to be 1235.69)
Null Deviance: 5222086 on 2999 degrees of freedom
Residual Deviance: 3689770 on 2986 degrees of freedom
AIC: 29887.75
Number of Local Scoring Iterations: NA
Anova for Nonparametric Effects
Npar Df Npar F Pr(F)
(Intercept)
s(year, 4) 3 1.086 0.3537
s(age, 5) 4 32.380 <2e-16 ***
education
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The p-values for year
and age
correspond to a null hypothesis of a linear
relationship versus the alternative of a non-linear relationship. The large
p-value for year
reinforces our conclusion from the ANOVA test that a linear
function is adequate for this term. However, there is very clear evidence
that a non-linear term is required for age
.
medv
using rm
and crim
:
a GAM that excludes rm
(\(\mathcal{M}_1\)), a GAM that uses a linear function
of rm
(\(\mathcal{M}_2\)), or a GAM that uses a degree-3 smoothing spline function of rm
(\(\mathcal{M}_3\)).
Also add a degree-4 smoothing spline function of crim
in each of the models.
Store the models in gam1
, gam2
, and gam3
, respectively.gam1
gam2
gam3
Assume that: