Finally, we perform PLS using the full data set, using \(M = 1\), the number of components identified by cross-validation.
> pls.fit <- plsr(Salary ~ ., data = Hitters, scale = TRUE, ncomp = 1)
> summary(pls.fit)
Data: X dimension: 263 19
Y dimension: 263 1
Fit method: kernelpls
Number of components considered: 1
TRAINING: % variance explained
1 comps
X 38.08
Salary 43.05
Notice that the percentage of variance in Salary that the one-component PLS fit explains, 43.05%, is almost as much as that explained using the final five-component model PCR fit, 44.90%. This is because PCR only attempts to maximize the amount of variance explained in the predictors, while PLS searches for directions that explain variance in both the predictors and the response.
Boston
dataset, try creating a PLS model with the full dataset and store it in pls.fit
(use the ncomp you determined in the previous exercise)Assume that:
MASS
and pls
libraries have been loadedBoston
dataset has been loaded and attached