Californian researchers believe they have found a new way to reveal the biological mechanisms that link genes to human traits and diseases.
The tool, outlined in Nature, may help in understanding the genetic basis of complex traits that have so far eluded genome-wide association studies (GWAS).
The approach blends estimates of gene-trait relationships from loss-of-function (LoF) burden tests with advances in genome editing and high throughput single-cell RNA sequencing represented by the Perturb-seq method.
This creates a graph that reveals both trait-relevant pathways and also the functions of genes and programs to explain why particular genes are associated with traits.
“Although our work focuses on blood traits that underlie anemia and related diseases, we anticipate that the principles learned here can be broadly applicable,” reported the researchers, led by Mineto Ota, PhD, from Stanford University.
GWAS and rare variant burden tests have identified tens of thousands of reproducible associations for a wide range of traits and diseases.
These signals have identified many genes that can serve as therapeutic targets and have driven discoveries of new molecular mechanisms, critical cell types, and physiological pathways of disease risks or traits, as well as enabling genetic risk prediction for complex diseases.
Yet around two decades after the first GWAS, genome-scale approaches still do not exist that can infer interpretable, quantitative models of the biological pathways that connect genes to cellular functions to traits.
And aside from coarse-grained analyses such as identifying trait-relevant cell types and enriched gene sets, there remains a lack of genome-scale approaches for interpreting the molecular pathways and mechanisms through which hundreds or even thousands of genes affect a given phenotype.
Ota and team examined whether advanced genetic analysis involving methods such as Perturb-seq could provide new opportunities to measure causal gene-regulatory connections at genome.
Specifically, they investigated whether approaches combining genetic association and Perturb-seq data could shed light on how genetic variants link to functional programs to traits are described.
In a proof-of-concept study, the researchers constructed a causal graph of the gene-regulatory hierarchy that jointly controls three partially co-regulated blood traits.
Firstly, they examined whether there are any traits with high-quality genetic data where the most relevant cell type could be well modeled by existing Perturb-seq data.
At the time of their research, the only published genome-wide Perturb-seq dataset was collected in a leukemia cell line: K562. In that experiment, every expressed gene was knocked down using CRISPR interference, one gene per cell, before single-cell RNA sequencing.
To determine which traits could reasonably be modeled in terms of the gene-regulatory networks of K562 cells, the team then compiled published GWAS and LoF burden test data for a wide range of traits measured in the UK Biobank.
Based on this, the team chose to study three traits: mean corpuscular hemoglobin (MCH), which measures the mean amount of hemoglobin per erythrocyte; red cell distribution width (RDW)—the standard deviation of the size of erythrocytes per individual; and the immature reticulocyte fraction (IRF).
For these traits, a considerable amount of SNP heritability was explained by open chromatin regions in the K562 cell line, representing 53%, 44% and 36% of the total SNP heritability, respectively.
The investigators then built graphs that incorporated quantitative gene effects estimated from LoF burden tests instead of unsigned enrichment of GWAS hits.
Unlike GWAS hits, they found that LoF effect sizes are inherently directional, they were automatically linked to the correct genes, and their magnitudes were comparable across genes.
In addition, the researchers said that, compared with common variants with tiny effects, LoFs were probably more functionally like CRISPR knockdowns, given the widespread non-linear and even non-monotonic relationships between gene expression and phenotypes.
They concluded: “Although our proof of principle here uses experimental data from K562 cells to model erythrocyte traits, we expect that the next generation of perturbation studies in cells, organoids and tissues will provide a critical interpretative framework for human genetics.”
