A team of investigators at Penn State College of Medicine reports the development of a new method that improves the mapping of genetic variants that influence the risk of neurodegenerative diseases. The method, published in Nature Communications, was created in response to long-standing challenges in connecting genome-wide association study data to specific changes in gene expression within the brain. The researchers sought a new approach because conventional bulk tissue studies mix many cell types together, and available single-cell datasets, although more precise, are small—especially for rare brain cell types that play important roles in conditions such as Alzheimer’s disease and amyotrophic lateral sclerosis (ALS).
“There’s a lot of emphasis on data generation, but relatively modest efforts devoted to better analyzing the data,” said senior author said Bibo Jiang, PhD, an assistant professor of public health sciences at Penn State College of Medicine. “There’s a lot more information that could be extracted from existing data sets and our work seeks to better digest this information. It has the potential to create a new paradigm for understanding brain-related disease.”
The research for this new method was undertaken based on the need to better link the non-coding genetic variants that are often identified in genome-wide association studies to measurable differences in gene expression. “Genome-wide association studies have identified many loci for brain disorders, but most non-coding variants fail to colocalize with bulk expression quantitative trait loci. Single-cell expression quantitative trait loci studies capture cell-type-specific regulation but are often underpowered,” the researchers wrote.
To help fill this information deficit, the Penn State team developed their new approach, called BASIC (Bulk And Single cell eQTL Integration across Cell states), which combines bulk and single-cell expression quantitative trait locus (eQTL) data by modeling shared and distinct regulatory effects across seven brain cell types. Instead of treating each cell type separately, the method uses principal components of gene expression to establish “axis-QTLs,” which capture shared and distinct regulatory effects across cell types and substantially improves the ability to identify regulatory variants.
Using BASIC, the team identified 5,644 additional genes with eQTLs (a 74.5% increase) compared with single-cell data alone. This represents the equivalent of increasing sample size by 76.8%. BASIC also increased the accuracy of linking genetic variants to 12 brain-related diseases by 53% relative to single-cell analyses and 111% relative to bulk studies. This is especially important for analyzing cells such as microglia, which have been linked to neuroinflammation and neurodegenerative disease but are difficult to study because of their low abundance in tissue samples.
Specifically, using BASIC the team found new genes associated with Alzheimer’s disease and ALS, including regulatory effects that conventional methods could not detect. In conjunction with these findings, it also helped identify potential therapeutic compounds that could address these gene expression patterns. For example, alfacalcidol, a synthetic form of vitamin D, was identified as a potential treatment for schizophrenia, and cabergoline, a drug already approved to treat high prolactin levels, was identified as a potential candidate for Alzheimer’s disease.
While this research looked for genetic variants in brain tissue, the researchers believe BASIC could be adapted to other tissues. Additional research by the team will look to extend the model to better incorporate diverse ancestries and to better manage heterogeneity among datasets, which will be important as more functional genomic data from around the world become available.
