Here's the pseudocode for the Unweighted Pair Group Method with Arithmetic Mean (UPGMA):

UPGMA(D)
    Clustersn single-element clusters labeled 1, …, n 
    construct a graph T with n isolated nodes labeled by single elements 1, …, n 
    for every node v in T 
        Age(v) ← 0
    while there is more than one cluster 
        find the two closest clusters Ci and Cj
        merge Ci and Cj into a new cluster Cnew with |Ci| + |Cj| elements
        add a new node labeled by cluster Cnew to T
        connect node Cnew to Ci and Cj by directed edges
        remove the rows and columns of D corresponding to Ci and Cj
        remove Ci and Cj from Clusters
        add a row/column to D for Cnew by computing D(Cnew, C) for each C in Clusters
        add Cnew to Clusters
    root ← the node in T corresponding to the remaining cluster
    for each edge (v, w) in T
        length of (v, w) ← Age(v) - Age(w)
    return tree T

Assignment

Define a class Tree that can be used to represent rooted trees whose nodes are uniquely labeled and whose edges are weighted with a floating point number. The initialisation method takes the label (an integer or a string) of the root node $$r$$ and zero or more subtrees that are connected to the root node. Each connected subtree is represented as a tuple $$(s, w)$$, with $$s$$ the subtree (an object of the class Tree) and $$w$$ (a floating point number) the weight of the edge connecting the subtree $$s$$ to the root node $$r$$. Make sure the built-in function repr returns string representations of objects of the class Tree that are formatted according to the example given below.

Add a method UPGMA to the class DistanceMatrix that takes no arguments and returns the rooted tree (an object of the class Tree) that results from applying the UPGMA algorithm on the distance matrix.

Note on formatting

The n leaves must be labeled 0, 1, …, n - 1 in order of their appearance in the distance matrix. Internal nodes must be uniquely labeled, but the actual labeling can be freely chosen (integers, strings, …).

Example

In the following example we assume the text file distances.txt1 to be located in the current directory.

>>> tree = Tree('A', (Tree('B'), 6), (Tree('C'), 5), (Tree('D'), 11))
>>> tree 
Tree('A', (Tree('B'), 6), (Tree('C'), 5), (Tree('D'), 11))

>>> D = DistanceMatrix.loadtxt('distances.txt')
>>> D.UPGMA()
Tree(6, (Tree(5, (Tree(0), 7.0), (Tree(4, (Tree(2), 5.0), (Tree(3), 5.0)), 2.0)), 1.833333333333334), (Tree(1), 8.833333333333334))