For the small, simulated datasets we used before we knew what value of \(K\) to pick. But for real applications the true value of \(K\) is unknown. A popular way to determine the best value for \(K\) is to plot the within-cluster sum of squares for several values of \(K\) and look for the elbow in the graph. We can try this with the animals data set.

Questions

Use a for loop to perform K-mean clustering with \(K = i\) and nstart=20. You should run this for the values 1 until 12 for \(K\). Set a seed for every iteration. The seed should be the value of \(i\).
On the \(i\)th index of within store the within-cluster sum of squares

Look at the plot and answer the following question:

MC1:
A) With higher values of \(K\) the within-cluster sum of squares always goes down
B) We see larger drops in within-cluster sum of squares between higher values of \(K\) than for smaller values.
- 1) Both statements are true.
- 2) Both statements are false.
- 3) A is true and B is false.
- 4) A is false and B is true.

Assume that:

The cluster library has been loaded
The animals dataset has been loaded and attached
The NA values of animals have been removed