In order to fit a step function, we use the cut() function.

table(cut(age, 4))

(17.9,33.5]   (33.5,49]   (49,64.5] (64.5,80.1] 
        750        1399         779          72 
        
fit <- lm(wage ~ cut(age, 4), data = Wage)
coef(summary(fit))

                        Estimate Std. Error   t value     Pr(>|t|)
(Intercept)            94.158392   1.476069 63.789970 0.000000e+00
cut(age, 4)(33.5,49]   24.053491   1.829431 13.148074 1.982315e-38
cut(age, 4)(49,64.5]   23.664559   2.067958 11.443444 1.040750e-29
cut(age, 4)(64.5,80.1]  7.640592   4.987424  1.531972 1.256350e-01

Here cut() automatically picked the cutpoints at 33.5, 49, and 64.5 years of age. We could also have specified our own cutpoints directly using the breaks option. The function cut() returns an ordered categorical variable; the lm() function then creates a set of dummy variables for use in the regression. The age<33.5 category is left out, so the intercept coefficient of $94,160 can be interpreted as the average salary for those under 33.5 years of age, and the other coefficients can be interpreted as the average additional salary for those in the other age groups. We can produce predictions and plots just as we did in the case of the polynomial fit.

Questions

Fit a linear model of medv using step functions of dis. However, instead of giving the number of intervals, set 5 cut points by specifying a numeric vector in the breaks argument. The cut points should be placed at 1, 3, 6, 9, and 12. Store the result in the variable fit.
MC1: What is the expected medv for a neighbourhood with a dist value of 8:
- 1: 19.73
- 2: 25.15
- 3: 25.07
- 4: 22.45

Assume that:

The MASS library has been loaded
The Boston dataset has been loaded and attached