The Lloyd algorithm is one of the most popular clustering heuristics for the k-Means Clustering Problem. It first chooses k arbitrary points Centers from Data as centers and then iteratively performs the following two steps:

  • Centers to Clusters: After centers have been selected, assign each data point to the cluster corresponding to its nearest center; ties are broken arbitrarily.
  • Clusters to Centers: After data points have been assigned to clusters, assign each cluster's center of gravity to be the cluster's new center.

We say that the Lloyd algorithm has converged if the centers (and therefore their clusters) stop changing between iterations.

Assignment

Add a method lloyd_algorithm that takes a file containing a list of n-dimensional tuples (in python notation) and an integer $$k$$ which is the required number of calculated centers.

The function returns the $$k$$ centers of the data points as a set of tuples of floats.

Example

In the following interactive session, we assume the TXT file data01.txt1 to be located in the current directory.

>>> lloyd_algorithm('data01.txt'2, 1)
{(41.6, 49.5)}