Continuing on the biopsy dataset.
Previously we create a train and test set by using a logical condition (all the rows before 2004 and all the rows after 2005).
In this dataset we do not have a time variable we can use to logically split our data into a train and test set.
Remember from the section of Indexing1 that we can use the negative sign -
to keep all rows or columns except those indicated in the index.
We can utilise this feature of R to easily create a train and test set. Here, we select the first 400 rows to be in our train set and all the remaining rows to be in our test set:
train <- 1:400
train.X <- biopsy[train, 1:9]
test.X <- biopsy[-train, 1:9]
train.Y <- biopsy[train, 10]
test.Y <- biopsy[-train, 10]
Additionally, we create a divide between the predictors X and the response Y.
train
with all the numbers from 1 to 400train
and store it in test.X
train
and store it in test.Y
class
with
the attributes V1
through V9
, use the train
vector for the subset
parameter,
and store it in glm.fit
test.X
subset and store it in glm.probs
glm.pred
that contains the predicted values the glm.fit
model
you created previously (use a cut-off point of 0.5)glm.table
glm.acc
Assume that:
MASS
library has been loadedbiopsy
dataset has been loaded and attachedID
column has been dropped