Research led by scientists at The Hospital for Sick Children (SickKids) have identified sequence-level changes within short tandem repeats (STRs) that appear to influence phenotypic variation within diseases and helps explain why some treatments for diseases work better for some patients than others. The findings, published in Genome Biology, could have implications for the treatment of diseases known to be caused by tandem repeats including Huntington’s disease.
“These changes in STR composition aren’t rare, they’re a normal part of human genetic diversity. This is a new dimension of genetic variation that’s been hiding in plain sight,” said senior author Ryan Yuen, PhD, a senior scientist at SickKids.
Tandem repeats are repeated sections of DNA strands and make up roughly seven percent of the human genome. They are known to contribute to monogenic and complex disorders including fragile X syndrome, autism spectrum disorder, schizophrenia, cancer, and cardiomyopathies in addition to Huntington’s disease.
For this study, the researchers examined whether differences in the sequence motifs of STRs, not just their expansion thresholds, could shape gene regulation and contribute to variation in clinical phenotypes. Their research was influenced by prior studies showing that sequence interruptions can modulate pathogenicity in repeat-associated diseases and by new findings of disease-causing repeat insertions composed of alternative motifs.
To investigate this, investigators analyzed STR sequence composition of 3,150 people from two different data sets. The analysis of these short-read sequencing data was aided by a SickKids-developed algorithm that detected by repeat length and motif compositions to discover their relationship to gene expression in 49 different human tissue types.
This analysis showed that variable STRs exhibit clear distribution patterns that may be biologically relevant. “These variable repeats are more prone to expansion and are frequently found in proximity to Alu elements,” the researchers wrote. “Notably, STRs with variable motifs are often found near splice junctions of genes involved in brain and neuronal functions.”
The SickKids team also found variable STRs enriched at splice junctions of genes tied to “neuron,” “axon,” and “growth” functions, in brain regions such as the hippocampus, hypothalamus, nucleus accumbens, and putamen, regions that support processes related to motor control, learning, reward, language, and cognition.
The research also uncovered ethnic differences in STR variation, noting a higher frequency of alternative motifs among people of African descent and detected previously undescribed motifs in multiple regions linked to monogenic repeat disorders.
These findings could have implications for clinical care. For clinicians, knowing that STR sequence composition, rather than just the length of the repeat, can influence gene expression and disease-relevant pathways could inform diagnostic interpretation, risk assessment, and prognosis. The findings may also help explain why some patients with similar repeat lengths experience different symptom severity or respond differently to therapeutic interventions.
“We saw clear patterns, like these diverse repeats appearing in genes related to neurodevelopment and brain function,” said first author Alexandra (Sasha) Mitina, PhD, a research fellow at SickKids. “Genes affected by these variations are linked to critical biological processes and may help explain individual differences in health and disease.”
There are also potential implications for drug development targeting diseases driven by tandem repeats. For instance, if specific motif variants modulate gene expression, drugs targeting pathways influenced by motif composition could complement approaches that focus on repeat-length instability. Further, as long-read sequencing becomes more integrated in clinical genomics and decision making.
“Our approach lets us see both size and sequence composition. We’re still only scratching the surface, but these regions may hold the answers to some of the unknowns in our genome and contain potential targets for future disease studies,” Yuen said.
