One of the UNIX commands used to compare two files is comm (abbreviation of “common”). It can be used to compare sorted files line by line and allows you to easily retrieve the common lines, as well as the lines occurring in only one of the files.

Assignment

The text files AT1G27630.txt and AT2G25940.txt contain information about two Arabidopsis thaliana genes. Each of these genes has been annotated with a series of GO (gene ontology) terms. The files contain these GO terms, one term per line, sorted alphabetically.

Try to figure out how the comm command works. Use it to produce the following results:

  1. Use comm to output all GO terms that AT1G27630 and AT2G25940 have in common.
  2. Again use comm but now output all GO terms that are specific for AT2G25940, i.e., that are not annotated to AT1G27630.