Drop links or images here to add them to the editor.

We implement partial least squares (PLS) using the plsr() function, also in the pls library. The syntax is just like that of the pcr() function.

> set.seed(1)
> pls.fit <- plsr(Salary ~ ., data = Hitters, subset = train, scale = TRUE, validation = "CV")
> summary(pls.fit)
Data: 	X dimension: 131 19 
	Y dimension: 131 1
Fit method: kernelpls
Number of components considered: 19

VALIDATION: RMSEP
Cross-validated using 10 random segments.
       (Intercept)  1 comps  2 comps  3 comps  4 comps
CV           428.3    325.5    329.9    328.8    339.0
adjCV        428.3    325.0    328.2    327.2    336.6
...

TRAINING: % variance explained
        1 comps  2 comps  3 comps  4 comps  5 comps
X         39.13    48.80    60.09    75.07    78.58
Salary    46.36    50.72    52.23    53.03    54.07
...

One can also plot the cross-validation scores using the validationplot() function. Using val.type="MSEP" will cause the cross-validation MSE to be plotted:

> validationplot(pls.fit, val.type = "MSEP")

plot

The lowest cross-validation error occurs when only \(M = 1\) partial least squares directions are used. We now evaluate the corresponding test set \(MSE\).

> pls.pred <- predict(pls.fit, x[test,], ncomp = 1)
> mean((pls.pred - y.test)^2)
[1] 151995

The test \(MSE\) is comparable to, but slightly higher than, the test \(MSE\) obtained using ridge regression, the lasso, and PCR.

Questions


Assume that: