Different variants of a gene, known as isoforms, can be transcribed and translated at varying levels within a cell or tissue. These isoforms are commonly a result of alternative splicing, which produce distinct protein products with potentially different functions and expression patterns. Analyzing isoform expression is crucial for understanding cell-type identity, disease mechanisms like cancer, and the impact of genetic variations.
While current RNA-seq methods for quantifying isoform expression depend heavily on pre-existing transcript annotations, thousands of potentially disease causing unannotated isoforms are actively expressed in each sample, remaining invisible to traditional approaches.
At the American Society for Human Genetics (ASHG) Annual Meeting, researchers from University of Chicago and Columbia University introduce Torino, a computational workflow that leverages biobank-scale RNA-seq data to directly decode transcript structures and expression levels from read coverage alone. Torino enables transcriptome-wide discovery of novel, functionally important isoforms and their regulation to bridge RNA processing complexity with genetic architecture and disease.
Authors of the study include Yang Lin, PhD, associate professor of medicine and Matthew Stephens, PhD, professor of statistics and human genetics at University of Chicago.
Torino models RNA-seq data using Poisson non-negative matrix factorization with spatial smoothness priors, enabling it to infer latent transcript structures without annotations. This approach uncovers novel isoforms shaped by alternative splicing, intron retention, and alternative polyadenylation (APA).
The Genotype-Tissue Expression (GTEx) Portal is a comprehensive public resource for researchers studying tissue and cell-specific gene expression and regulation across individuals, development, and species. Applied to 2,128 GTEx samples across 19 tissues, Torino accurately recovers 18,813 GENCODE isoforms spanning 15,232 protein-coding genes, while revealing extensive unannotated diversity, including over 10,000 novel cassette exon events, over 53,000 novel intron retention events, and 8,013 APA events. Thousands of these events exhibit strong tissue specificity, suggesting functional roles.
By harnessing Torino-inferred isoform abundances, the authors uncovered a median of 2,829 isoform QTLs per tissue, demonstrating widespread genetic control over RNA processing. Colocalization with 65 GWAS traits pinpointed 815 disease-linked variants overlapping isoQTLs, suggesting unannotated splicing events as hidden disease drivers.
When applied to 1,193 brain samples from the Alzheimer’s Disease Functional Genomics Consortium, Torino uncovered a global increase in intron retention tied to Alzheimer’s diagnosis and Braak stage, a semiquantitative measure of severity of neurofibrillary tangle (NFT) pathology. Abberrant splicing was observed in AD risk genes PTK2B and APBB3, suggesting RNA processing defects play a role in disease progression.
Looking ahead, Torino opens new directions to decode the regulatory genome across additional tissues, species, and complex diseases.