The soft k-means clustering algorithm starts from randomly chosen centers and iterates the following two steps:

We begin with the "Centers to Soft Clusters" step. If we think about the centers as stars and the data points as planets, then the closer a point is to a center, the stronger that center's "pull" should be on the point. Given k centers Centers = (x1, ..., xk) and n points Data = (Data1, ... , Datan), we therefore need to construct a (k x n) responsibility matrix HiddenMatrix for which HiddenMatrixi,j is the pull of center i on data point j. This pull can be computed according to the Newtonian inverse-square law of gravitation,

$$ \textit{HiddenMatrix}_{i, j} = \dfrac{1/d(\textit{Data}_j, x_i)^2}{\sum_{\text{all centers }x_i} 1/d(\textit{Data}_j , x_i)^2} $$

Unfortunately for Newton fans, the following partition function from statistical physics often works better in practice:

$$ \textit{HiddenMatrix}_{i,j} = \dfrac{e^{-\beta \cdot d(Data_j,\, x_i)}}{\sum_{\text{all centers }x_i} e^{-\beta \cdot d(Data_{j},\, x_i)}} $$

In this formula, e is the base of the natural logarithm ($$e\approx2.72$$), and $$\beta$$ is a parameter reflecting the amount of flexibility in our soft assignment and called - appropriately enough - the stiffness parameter.

In soft k-means clustering, if we let HiddenMatrixi denote the i-th row of HiddenMatrix, then we can update center xi using an analogue of the above formulas. Specifically, we will define the j-th coordinate of center xi, denoted xi, j, as

$$ x_{i, j} = \dfrac{\textit{HiddenMatrix}_i \cdot \textit{Data}^j}{\textit{HiddenMatrix}_i\cdot\overrightarrow{1}} $$

Here, Dataj is the n-dimensional vector holding the j-th coordinates of the n points in Data.

The updated center xi is called a weighted center of gravity of the points Data.

Assignment

Add a method soft_kmeans that takes a file of a list of n-dimensional tuples, an integer $$k$$ which is the required number of calculated centers and $$\beta$$, a stiffness parameter.

The function returns the $$k$$ centers of the data points as a set of tuples of floats.

Example

In the following interactive session, we assume the TXT file data01.txt1 to be located in the current directory.

>>> soft_kmeans('data01.txt'2, 1, 4.478143076533713)
{(38.893358437857046,)}