Another Unix command used to compare two files is comm (abbreviation of “common”). It can be used to compare sorted files line by line and allows you to easily retrieve the common lines (as well as the lines occurring in only one of the files).

Sorting file contents

The lines in the input files for this exercise have already been sorted. Later we will introduce the sort command for this. Besides comm there are other Unix commands that require the contents of the input to be (partially) sorted, like uniq. Keeping the DOTADIW1 principle in mind, these commands don’t sort the input for you, but expect it has already been sorted before.

Assignment

The text files AT1G27630.txt2 and AT2G25940.txt3 contain information about two Arabidopsis Thaliana genes. Each of these genes has been annotated with a series of GO (gene ontology4) terms. The files contain these GO terms, one term per line, sorted alphabetically.

Try to figure out how the comm command works. Use it to produce the following results:

  1. Use comm to output all GO terms that AT1G27630 and AT2G25940 have in common.
  2. Again use comm but now output all GO terms that are specific for AT2G25940, i.e. that are not annotated to AT1G27630.