Exercise 2.3.1
Note: This exercise won’t work yet due - install BioPython!
Retrieve a FASTA file named data/sample.fa
using BioPython and answer the following questions:
- How many sequences are in the file?
- What are the IDs and the lengths of the longest and the shortest sequences?
- Select sequences longer than 500bp. What is the average length of these sequences?
- Calculate and print the percentage of GC in each of the sequences.
- Write the newly created sequences into a FASTA file named
long_sequences.fa
.