Continuing on the biopsy dataset.

Previously we create a train and test set by using a logical condition (all the rows before 2004 and all the rows after 2005). In this dataset we do not have a time variable we can use to logically split our data into a train and test set. Remember from the section of Indexing¹ that we can use the negative sign - to keep all rows or columns except those indicated in the index. We can utilise this feature of R to easily create a train and test set. Here, we select the first 400 rows to be in our train set and all the remaining rows to be in our test set:

train <- 1:400
train.X <- biopsy[train, 1:9]
test.X <- biopsy[-train, 1:9]
train.Y <- biopsy[train, 10]
test.Y <- biopsy[-train, 10]

Additionally, we create a divide between the predictors X and the response Y.

Questions

Create a vector train with all the numbers from 1 to 400
Create a subset of biopsy with only the predictors and the rows with the index not in train and store it in test.X
Create a subset of biopsy with only the response and the rows with the index not in train and store it in test.Y
Create a Logistic Regression model to predict class with the attributes V1 through V9, use the train vector for the subset parameter, and store it in glm.fit
Predict the probabilities that a patient has a malignant tumour for the test.X subset and store it in glm.probs
Create a vector glm.pred that contains the predicted values the glm.fit model you created previously (use a cut-off point of 0.5)
Create a table that compares the predicted results with the actual diagnosis and store it in glm.table
Calculate the accuracy of the model and store in glm.acc

Assume that:

The MASS library has been loaded
The biopsy dataset has been loaded and attached
The rows with NA values have been dropped
The ID column has been dropped