A node $$v$$ in a directed graph $$\mathcal{G}$$ is called a 1-in-1-out node if its indegree and outdegree are both equal to 1, i.e., In(v) = Out(v) = 1.  A maximal non-branching path is a path whose internal nodes are 1-in-1-out nodes and whose initial and final nodes are not 1-in-1-out nodes. This includes the special case when $$\mathcal{G}$$ has a connected component that is an isolated cycle, in which all nodes are 1-in-1-out nodes.

The MaximalNonBranchingPaths pseudocode below generates all non-branching paths in a graph. It iterates through all nodes of the graph that are not 1-in-1-out nodes and generates all non-branching paths starting at each such node. In a final step, MaximalNonBranchingPaths finds all isolated cycles in the graph.

MaximalNonBranchingPaths(G)
    P ← empty set
    for each node v in graph G
        if node v is not a 1-in-1-out node
            if Out(v) > 0
                for each outgoing edge (v, w) from node v
                    p ← the path consisting of the single edge (v, w)
                    while node w is a 1-in-1-out node
                        extend path p by the outgoing edge (w, u) from node w 
                        wu
                    add path p to the set P
    for each isolated cycle c in graph G
        add cycle c to the set P
    return the set P

Assignment

Write a function maximal_nonbranching_paths that takes the location of a text file containing the description of an directed graph $$\mathcal{G}$$, where the graph is given in the form of an adjacency list. Each line of the file contains the label (an integer) of a starting node, a space, the characters ->, another space and a comma-separated list containing the lexicographically sorted labels (integers) of all ending nodes connected to that starting node. The function must return a set containing all maximal non-branching paths in the graph, with each path represented as a tuple of node labels. Note that maximal non-branching cycles are tuples with the first element being the same as the last element.

Example

In the following interactive session, we assume the FASTA file data01.txt1 to be located in the current directory.

>>> maximal_nonbranching_paths('data01.txt')
{(6, 7, 6), (3, 5), (3, 4), (1, 2, 3)}