Continuing on the biopsy dataset.
Previously we create a train and test set by using a logical condition (all the rows before 2004 and all the rows after 2005).
In this dataset we do not have a time variable we can use to logically split our data into a train and test set.
Remember from the section of Indexing that we can use the negative sign - to keep all rows or columns except those indicated in the index.
We can utilise this feature of R to easily create a train and test set. Here, we select the first 400 rows to be in our train set and all the remaining rows to be in our test set:
train <- 1:400
train.X <- biopsy[train, 1:9]
test.X <- biopsy[-train, 1:9]
train.Y <- biopsy[train, 10]
test.Y <- biopsy[-train, 10]
Additionally, we create a divide between the predictors X and the response Y.
train with all the numbers from 1 to 400train and store it in test.Xtrain and store it in test.Yclass with
the attributes V1 through V9, use the train vector for the subset parameter,
and store it in glm.fittest.X subset and store it in glm.probsglm.pred that contains the predicted values the glm.fit model
you created previously (use a cut-off point of 0.5)glm.tableglm.accAssume that:
MASS library has been loadedbiopsy dataset has been loaded and attachedID column has been dropped