For the small, simulated datasets we used before we knew what value of \(K\) to pick.
But for real applications the true value of \(K\) is unknown.
A popular way to determine the best value for \(K\) is to plot the within-cluster sum of squares for several values of \(K\) and look for the elbow in the graph.
We can try this with the animals
data set.
nstart=20
. You should run this for the values 1 until 12 for \(K\).
Set a seed for every iteration.
The seed should be the value of \(i\).within
store the within-cluster sum of squaresLook at the plot and answer the following question:
Assume that:
cluster
library has been loadedanimals
dataset has been loaded and attachedanimals
have been removed