Continuing on the biopsy dataset.

Previously we create a train and test set by using a logical condition (all the rows before 2004 and all the rows after 2005). In this dataset we do not have a time variable we can use to logically split our data into a train and test set. Remember from the section of Indexing1 that we can use the negative sign - to keep all rows or columns except those indicated in the index. We can utilise this feature of R to easily create a train and test set. Here, we select the first 300 rows to be in our train set and all the remaining rows to be in our test set:

train <- 1:300
train.X <- biopsy[train, 1:9]
test.X <- biopsy[-train, 1:9]
train.Y <- biopsy[train, 10]
test.Y <- biopsy[-train, 10]

Additionally, we create a divide between the predictors X and the response Y.

Questions

Assume that: