The Gene Transfer File (GTF) format is one of the most frequently used file formats in bioinformatics. It stores genomic coordinates in a tab separated format. Every line is a single feature with a start and end coordinate. A feature can for instance be a gene, transcript or exon.
https://www.ensembl.org/info/website/upload/gff.html
Some lines in GTF format:
chr9 HAVANA gene 32566787 32568619 . + . gene_id "ENSG00000241043.1"; transcript_id "ENSG00000241043.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "GVQW1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "GVQW1"; level 2; havana_gene "OTTHUMG00000019744.1"; chr13 ENSEMBL gene 113755563 113756608 . - . gene_id "ENSG00000268130.1"; transcript_id "ENSG00000268130.1"; gene_type "protein_coding"; gene_status "NOVEL"; gene_name "AL137002.1"; transcript_type "protein_coding"; transcript_status "NOVEL"; transcript_name "AL137002.1"; level 3; chr10 HAVANA gene 27035522 27150016 . - . gene_id "ENSG00000136754.12"; transcript_id "ENSG00000136754.12"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "ABI1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "ABI1"; level 2; tag "ncRNA_host"; havana_gene "OTTHUMG00000017848.1";
Write a program that counts the numbers of elements in the GTF file per chromosome.
The only input is a file name. The file itself contains coordinates in the GTF format.
The number of elements in the GTF file by chromosome. Every line contains a chromosome and the number of elements, separated by a tab. The chromosomes are sorted alphabetically.
Input:
gencode.gtf
Output:
chr1 375 chr10 175 chr11 206 chr12 191 chr13 77 chr14 129 chr15 140 chr16 138 chr17 186 chr18 73 chr19 193 chr2 265 chr20 89 chr21 45 chr22 81 chr3 210 chr4 177 chr5 189 chr6 194 chr7 190 chr8 157 chr9 150 chrM 5 chrX 164 chrY 56