Another Unix command used to compare two files is comm
(abbreviation of “common”).
It can be used to compare sorted files line by line and allows you to easily retrieve the common lines
(as well as the lines occurring in only one of the files).
Sorting file contents
The lines in the input files for this exercise have already been sorted. Later we will introduce the
sort
command for this. Besidescomm
there are other Unix commands that require the contents of the input to be (partially) sorted, likeuniq
. Keeping the DOTADIW1 principle in mind, these commands don’t sort the input for you, but expect it has already been sorted before.
The text files AT1G27630.txt2 and AT2G25940.txt3 contain information about two Arabidopsis Thaliana genes. Each of these genes has been annotated with a series of GO (gene ontology4) terms. The files contain these GO terms, one term per line, sorted alphabetically.
Try to figure out how the comm
command works. Use it to produce the following results:
comm
to output all GO terms that AT1G27630 and AT2G25940 have in common.comm
but now output all GO terms that are specific for AT2G25940, i.e. that are not annotated to AT1G27630.