In this problem, you will develop a model to predict whether a given car gets high or low gas mileage based on the Auto data set.

plot

Questions

Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.

  1. Do the following preprocessing steps:

    1. Create a binary variable, mpg01, that contains a 1 if mpg contains a value above its median, and a 0 if mpg contains a value below its median. You can compute the median using the median() function.

    2. Use the data.frame() function to create a single data set containing both mpg01 and the other Auto variables. Add mpg01 as the last column in the new dataset. Store the result in the variable data.

  2. Explore the data graphically in order to investigate the association between mpg01 and the other features. Which of the other features seem most likely to be useful in predicting mpg01? For example, you can make pairwise scatterplots with pairs().

  3. Do a train-test split:

    1. Split the data into a training set and a test set with the sample() function. Take 60% of the data (235 rows) in the training set and the other 40% in the test set. Use a seed value of 1. Store the indices of the training set in train.
    2. Create a hold out dataset data.test that only contains the test observations (dependent + independent variables).
    3. Create a hold out dependent variable mpg01.test that only contains the test observations.

  4. Perform LDA on the training data in order to predict mpg01 using the variables cylinders, weight, displacement and horsepower. What is the test accuracy of the model obtained? Store the model in lda.fit, the predictions in lda.pred and the test accuracy in lda.acc.


Assume that: