The Wage data set contains a number of other features not explored in this chapter, such as marital status (maritl) and job class (jobclass). In this exercise, we explore the relationships between these predictors and wage, and use non-linear fitting techniques in order to fit flexible models to the data.

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. Inspect summary statistics of the categorical variables maritl and jobclass. Create a boxplot to check the average wage for each of the categories.

    • MC1:
      On average, people with which marital status earn most?
      • 1: Never Married
      • 2: Married
      • 3: Widowed
      • 4: Divorced
      • 5: Separated

    • MC2:
      On average, people with which job class earn most?
      • 1: Industrial
      • 2: Information

  2. Fit 3 GAM models and perform an ANOVA test to see which model fits best to the data:

    1. All 4 models (fit1, fit2, fit3) have a local regression term of year with a span of 0.7, a smoothing spline of age with 5 degrees of freedom, and a linear function of education.
    2. Leave fit1 as it is.
    3. To fit2, add a linear function of jobclass.
    4. To fit3, add both linear functions jobclass and maritl.
    5. Perform an ANOVA analysis on the 3 models. Store the object in anova.wage. Inspect the results.

    • MC3:
      Check the null hypothesis that model \(\mathcal{M}_1\) is sufficient to explain the data against the alternative hypothesis that a more complex model \(\mathcal{M}_2\) is required. Which model would you use?
      • 1: fit1
      • 2: fit2
      • 3: fit3


Assume that: