As buds give rise by growth to fresh buds, and these, if vigorous, branch out and overtop on all sides many a feebler branch, so by generation I believe it has been with the great Tree of Life, which fills with its dead and broken branches the crust of the earth, and covers the surface with its ever-branching and beautiful ramifications.
Charles Darwin, The Origin of Species
A century and a half has passed since the publication of Darwin's magnum opus, and yet the construction of a single Tree of Life uniting life on Earth still has not been completed, with perhaps as many as 90% of all living species not yet catalogued (although a beautiful interactive animation has been produced by OneZoom1).
To get an insight about state-of-art attempts to build this tree, you may take a look at the Tree of Life Web Project2 — a collaborative effort of biologists from around the world to combine information about diversity of life on Earth. It is a peer-reviewed ongoing project started in 1995, now it holds more than 10,000 pages with characteristics of different groups of organisms and their evolutionary history, and tree still grows. A similar effort is the Encyclopedia of Life3.
Instead of trying to construct the entire Tree of Life all at once, we often wish to form a simpler tree in which a collection of species have been clumped together for the sake of simplicity. Such a group is called a taxon (pl. taxa). For a given collection of taxa, a phylogeny is a treelike diagram that best represents the evolutionary connections between the taxa: the construction of a particular phylogeny depends on our specific assumptions regarding how these evolutionary relationships should be interpreted.
An undirected graph is connected if there is a path connecting any two nodes. A tree is a connected (undirected) graph containing no cycles. This definition forces the tree to have a branching structure organized around a central core of nodes, just like its living counterpart.
We have already grown familiar with trees in "Mendel's first law4", where we introduced the probability tree diagram to visualize the outcomes of a random variable.
In the creation of a phylogeny, taxa are encoded by the tree's leaves, or nodes having degree 1. A node of a tree having degree larger than 1 is called an internal node.
Write a function missingEdges that takes a positive integer $$n$$ and the location of a text file that contains an adjacency list corresponding to a graph on $$n$$ nodes (labeled $$1, 2, \ldots, n$$) that contains no cycles. Each line of the file contains two space-separated integers that describe an edge relation of the graph. The function must return the minimum number of edges that can be added to the graph to produce a tree.
>>> missingEdges(10, 'data01.txt') 3 >>> missingEdges(826, 'data02.txt') 801
After solving this assignment, a standard mathematical exercise for the technically minded is to verify that every tree having 2 or more nodes must contain at least two leaves.