1. GTFtools

1.1 Description

GTFtools provides a set of functions to analyze various modes of gene models as described below.

Options Function Notes
-m For each gene, calculate merged and non-overlapping exons by merging exons of all splice isoforms from the same gene. The output is merged exons in bed format. Needed to calculate nonverlapping exonic length of genes with multiple splice isoforms.
-d Calculate independent introns which is defined as introns (or part of introns) that do not overlap with any exons of any genes in the genome2. It is calculated by subtracting merged exons from genes. The output is in bed format. Useful for intron retention detection.
-l Calculate gene lengths. Since a gene may have multiple isoforms, gene length calculation is not as simple as it appears. The mean, median and maximum of the lengths of isoforms for a given gene are considered. In addition, the length of merged exons of all isoforms (i.e. non-overlapping exon length) is also calculated. So, in total, four different types of gene lengths(the mean, median and max of lengths of isoforms of agene, and the length of merged exons of isoforms of a gene) are provided. Needed for e.g. calculating FPKM in RNA-seq data analysis, where gene length is required.
-r Calculate transcript isoform lengths.
-g Output gene coordination and ID mappings in bed format.
-s Output isoform coordination and parent-gene IDs in bed format.
-e Output exons in bed format.
-i Output introns in bed format.
-k Calculate introns (part of introns) that overlap with exons of other isoforms. The output is in bed format.
-u Output UTRs in bed format.
-t Out transcription start site (TSS) region in bed format.
-c Specify chromosomes to analyze. Dash(-) and comma(,) are allowed. For example, ‘-c 1-5,X,Y’ indicates 7 chromosomes: 1 to 5 together with X and Y. Default is 1-22, X and Y.

1.2 Download

GTFtools is implemented as a pure Python script, which is freely available for non-commercial use.

Version Changes
GTFtools_0.8.0 Corrected an error in parsing user-specified chromosome names to the function get_gene_length.

Note: a demo.gtf is included in this package to facilitate testing of this software.

2. Install

Download the above package and add its full path to the PATH variable (LINUX, UNIX, MacOS)
Dependence: the 'argparse' python module needs to be installed.

3. Usage

In general, you can run 'gtftools.py -h' to obtain help documents.

Note: the demo.gtf is included in this package
Example 1: For each gene, calculate non-overlapping exons by merging all exons of splice isoforms
gtftools.py -m merged_exons.bed demo.gtf

Example 2: calculate four types of gene lengths:
gtftools.py -l gene_length.bed demo.gtf

4. Contact

If any questions, please do not hesitate to contact me at:
Hongdong Li, hongdong@csu.edu.cn

5. How to cite?

If you use this tool, please cite the following work.

Hong-Dong Li, GTFtools: a Python package for analyzing various modes of gene models, bioRxiv, 263517, doi: https://doi.org/10.1101/263517

Developed at Center for Bioinformatics, Central South University, Changsha, P.R. China.