In this problem, you will develop a model to predict whether a given
car gets high or low gas mileage based on the Auto
data set.
Some of the exercises are not tested by Dodona (for example the plots), but it is still useful to try them.
Do the following preprocessing steps:
Create a binary variable, mpg01
, that contains a 1 if mpg
contains
a value above its median, and a 0 if mpg
contains a value below
its median. You can compute the median using the median()
function.
Use the data.frame()
function to create a single data set containing both mpg01
and
the other Auto
variables. Add mpg01
as the last column in the new dataset. Store the result in the variable data
.
Explore the data graphically in order to investigate the association between mpg01
and the other features.
Which of the other features seem most likely to be useful in predicting mpg01
?
For example, you can make pairwise scatterplots with pairs()
.
Do a train-test split:
sample()
function.
Take 60% of the data (235 rows) in the training set and the other 40% in the test set. Use a seed value of 1.
Store the indices of the training set in train
.data.test
that only contains the test observations (dependent + independent variables).mpg01.test
that only contains the test observations.
Perform LDA on the training data in order to predict mpg01
using the variables cylinders
, weight
, displacement
and horsepower
.
What is the test accuracy of the model obtained?
Store the model in lda.fit
, the predictions in lda.pred
and the test accuracy in lda.acc
.
Assume that:
ISLR2
and MASS
libraries have been loadedAuto
dataset has been loaded and attached