The FASTA file format1 is used to describe nucleotide or amino acid sequences. Each sequence is preceded by a header that starts with a > sign followed by the sequence id. For example, the FASTA file below contains three short amino acid sequences with sequences ids Seq1, Seq2 and Seq3. As you can see, the sequences themselves may be split across multiple lines depending on the sequence length.

>Seq1
GSRAS
>Seq2
GQIHHNRRIIGRYRVNSPYDKCILSKATFGWEWFFYSFNAEITFKCYRVIWASMSKWNRVLLSLYRWILHRQKEHK
KTRYTTWGKSQCTMCRRDHMFDAYAWLYQQAIFAKVPMSKAGHWFDTTDD
>Seq3
TKNHAKFFHGRQELIKLRQHDSIIHEKDHAFMVDHNIKLCFVSVSFEKAMYCMAGLVAKH

Write Unix pipelines (using one or more commands) to extract the following information from a given FASTA file:

  1. The number of sequences in the file. For the example above, this would be
    3
    
  2. The last two sequence ids in the file (without the > sign). For the example above, this would be
    Seq2
    Seq3
    
  3. The second sequence id in the file (without the > sign). For the example above this would be
    Seq2
    
  4. The total number of sequence characters (amino acids or nucleotides) in the file, for all sequences combined. For the example above this would be
    191
    

Each of your submitted pipelines will be tested on multiple FASTA files like this:

cat test.fasta | <your-pipeline>

Submission guidelines

Only submit the part that should be filled in at <your-pipeline> in the sample above.