In order to fit regression splines in R, we use the splines library.
We saw that regression splines can be fit by constructing an appropriate
matrix of basis functions. The bs() function generates the entire matrix of
basis functions for splines with the specified set of knots. By default, cubic
splines are produced. Fitting wage to age using a regression spline is simple:
library(ISLR2)
attach(Wage)
library(splines)
fit <- lm(wage ~ bs(age, knots = c(25, 40, 60)), data = Wage)
age.grid <- seq(from = min(age), to = max(age))
pred <- predict(fit, newdata = list(age = age.grid), se = TRUE)
plot(age, wage, col = "gray")
lines(age.grid, pred$fit, lwd = 2)
lines(age.grid, pred$fit + 2 * pred$se, lty = "dashed")
lines(age.grid, pred$fit - 2 * pred$se, lty = "dashed")

Here we have prespecified knots at ages 25, 40, and 60. This produces a
spline with six basis functions. (Recall that a cubic spline with three knots
has seven degrees of freedom; these degrees of freedom are used up by an
intercept, plus six basis functions.) We could also use the df option to
produce a spline with knots at uniform quantiles of the data.
dim(bs(age, knots = c(25, 40, 60)))
[1] 3000 6
dim(bs(age, df = 6))
[1] 3000 6
attr(bs(age, df = 6), "knots")
[1] 33.75 42.00 51.00
In this case R chooses knots at ages 33.8, 42.0, and 51.0, which correspond
to the 25th, 50th, and 75th percentiles of age. The function bs() also has
a degree argument, so we can fit splines of any degree, rather than the
default degree of 3 (which yields a cubic spline).
medv as dependent variable and lstat as independent variable.
The spline should be of degree-3 with two knots, at positions 10 and 20.
Store the result in the variable fit.medv for a series of values of lstat, ranging from 0 to 50, in steps of 1.
Store the results in variable preds.se.bands.Assume that: