We now perform PCR on the training data and evaluate its test set performance.
set.seed(1)
pcr.fit <- pcr(Salary ~ ., data = Hitters, subset = train, scale = TRUE, validation = "CV")
validationplot(pcr.fit, val.type = "MSEP")
Now we find that the lowest cross-validation error occurs when
> pcr.pred <- predict(pcr.fit, x[test,], ncomp = 5) > mean((pcr.pred - y.test)^2) [1] 142812
This test set
Finally, we fit PCR on the full data set, using
> pcr.fit <- pcr(y ~ x, scale = TRUE, ncomp = 5) > summary(pcr.fit) Data: X dimension: 263 19 Y dimension: 263 1 Fit method: svdpc Number of components considered: 5 TRAINING: % variance explained 1 comps 2 comps 3 comps 4 comps 5 comps X 38.31 60.16 70.84 79.03 84.29 y 40.63 41.58 42.17 43.22 44.90
Boston
dataset, create a PCR model using the training set provided below and store it in pcr.fit
(set the ncomp parameter to 10)MSEP(pcr.fit)
command to see the exact values of the summary()
function to see the pcr.pred
pcr.mse
Assume that:
MASS
and pls
libraries have been loadedBoston
dataset has been loaded and attached