For the small, simulated datasets we used before we knew what value of \(K\) to pick.
But for real applications the true value of \(K\) is unknown.
A popular way to determine the best value for \(K\) is to plot the within-cluster sum of squares for several values of \(K\) and look for the elbow in the graph.
We can try this with the animals data set.
nstart=20. You should run this for the values 1 until 12 for \(K\).
Set a seed for every iteration.
The seed should be the value of \(i\).within store the within-cluster sum of squaresLook at the plot and answer the following question:
Assume that:
cluster library has been loadedanimals dataset has been loaded and attachedanimals have been removed