A DNA sequence is a sequence of A, C, G and T characters (e.g. AATACCGCA). As such, the regular expression [ACGT][ACGT]* corresponds to the pattern that describes all possible DNA sequences (with at least one base pair).

Assignment

The current directory contains the text file dna.txt1 containing DNA sequences, each on a separate line. Use the grep command to restrict the list to the DNA sequences that

  1. end with AA (4 lines)

  2. start and end with C (11 lines)

  3. contain the substring TATA (11 lines)

  4. contain at least three G's (99 lines)

  5. contain at least three consecutive T's (24 lines)

With each description we also mention the number of matching DNA sequences (lines) in the file dna.txt2 between brackets. Try to keep the regular expressions as concise as possible.