The Gene Transfer File (GTF) format is one of the most frequently used file formats in bioinformatics. It stores genomic coordinates in a tab separated format. Every line is a single feature with a start and end coordinate. A feature can for instance be a gene, transcript or exon.
https://www.ensembl.org/info/website/upload/gff.html
Some lines in GTF format:
chr9 HAVANA gene 32566787 32568619 . + . gene_id "ENSG00000241043.1"; transcript_id "ENSG00000241043.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GVQW1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GVQW1"; level 2; havana_gene "OTTHUMG00000019744.1";
chr13 ENSEMBL gene 113755563 113756608 . - . gene_id "ENSG00000268130.1"; transcript_id "ENSG00000268130.1"; gene_type "protein_coding"; gene_status "NOVEL"; gene_name "AL137002.1"; transcript_type "protein_coding"; transcript_status "NOVEL"; transcript_name "AL137002.1"; level 3;
chr10 HAVANA gene 27035522 27150016 . - . gene_id "ENSG00000136754.12"; transcript_id "ENSG00000136754.12"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "ABI1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "ABI1"; level 2; tag "ncRNA_host"; havana_gene "OTTHUMG00000017848.1";
Write a program that counts the numbers of elements in the GTF file per chromosome.
The only input is a file name. The file itself contains coordinates in the GTF format.
The number of elements in the GTF file by chromosome. Every line contains a chromosome and the number of elements, separated by a tab. The chromosomes are sorted alphabetically.
Input:
gencode.gtf
Output:
chr1 375
chr10 175
chr11 206
chr12 191
chr13 77
chr14 129
chr15 140
chr16 138
chr17 186
chr18 73
chr19 193
chr2 265
chr20 89
chr21 45
chr22 81
chr3 210
chr4 177
chr5 189
chr6 194
chr7 190
chr8 157
chr9 150
chrM 5
chrX 164
chrY 56