Scientists have discovered that repeat expansions long thought to sit in noncoding DNA actually produce toxic proteins that drive a several rare muscle and neurodegenerative diseases with genetic similarities and clinical similarities, suggesting a new continuum of neurological diseases.
The study, published in Nature Genetics, centers on expansions of a short DNA sequence—GGC—repeated dozens to hundreds of times in tandem. These mutations are part of a larger class of genetic changes known as microsatellite repeat expansions, which cause over 60 disorders, notably polyglutamine (polyQ) diseases like Huntington’s disease caused by CAG triplet repeat expansions. In many well-known cases, repeats expand inside conventional protein-coding regions, producing abnormally long proteins that misfold and aggregate. But in the diseases examined here, the GGC expansions occur in genomic regions annotated as noncoding, creating a long-standing mystery about how they cause pathology.
The conditions linked to these mutations include oculopharyngodistal myopathy (OPDM), a rare adult-onset muscle disorder marked by drooping eyelids, difficulty moving the eyes, swallowing problems, and progressive weakness of facial and distal limb muscles. A related disorder, oculopharyngeal myopathy with leukoencephalopathy (OPML), combines similar muscle symptoms with degeneration of brain white matter. Neuronal intranuclear inclusion disease (NIID) primarily affects the nervous system and can cause tremor, ataxia, neuropathy, cognitive changes, and muscle weakness. Although clinically distinct, these disorders share GGC repeat expansions in genes such as GIPC1, RILPL1 and NOTCH2NLC.
The study, a collaborative effort primarily between researchers at Université de Strasbourg and Peking University First Hospital, shows that the GGC repeats are embedded within previously unrecognized open reading frames (ORFs)—short stretches of RNA capable of being translated into protein. Under normal circumstances, these hidden ORFs produce tiny, unstable microproteins that are rapidly degraded. When the GGC repeat expands beyond roughly 50 copies, however, the sequence is translated into a long chain of glycine amino acids. Because each GGC codon encodes glycine, the mutation generates polyglycine, or polyG, proteins.
These expanded polyG proteins are stable and prone to aggregation. In muscle biopsies from people with OPDM and OPML, researchers detected the newly identified polyG proteins within hallmark and rimmed vacuoles and rare eosinophilic intranuclear inclusions, which are p62-positive and ubiquitin-positive but of unknown origin and composition. Similar inclusions appear in the nervous system in NIID patients. Each disease subtype produces a distinct polyG protein depending on which gene hosts the expansion, but all share a glycine-rich core.
Experiments in cultured human muscle cells demonstrated that expressing expanded polyG proteins leads to the formation of cytoplasmic and nuclear aggregates and ultimately to cell death. Importantly, repeat-containing RNA that was engineered so it could not be translated into protein was not toxic, strongly indicating that the protein product—not the RNA alone—is the primary driver of disease.
Animal studies reinforced this conclusion. Mice engineered to express polyG proteins in skeletal muscle developed progressive muscle fiber atrophy and accumulated p62-positive inclusions that resemble those seen in patients. When expressed in the central nervous system, the proteins triggered neuroinflammation, loss of cerebellar Purkinje cells, impaired motor coordination, and shortened lifespan—features consistent with NIID. Although all polyG proteins shared a central glycine repeat, their surrounding amino acid sequences—derived from their host genes—strongly influenced how they localized within cells and how toxic they became.
The findings position OPDM, OPML, and NIID within a broader emerging class of polyG, or “polyG,” diseases. These disorders parallel polyglutamine diseases such as Huntington’s, in which repeat expansions generate homopolymeric amino acid stretches that misfold and damage cells. The crucial difference is that, in this case, the toxic proteins arise from genomic regions once believed incapable of encoding them.
The discovery also opens a potential therapeutic avenue. Researchers identified a small molecule called TMPyP4—a cationic porphyrin known to also inhibit human telomerase and stacks with G tetrads to stabilize quadruplex DNA—that binds GC-rich sequences and reduces production and aggregation of polyG proteins in cells and fruit fly models. By interfering with translation of the expanded repeats, the compound offers proof of principle that targeting repeat-driven protein synthesis could mitigate disease.
Beyond its clinical implications, the work challenges the conventional boundary between coding and noncoding DNA. It suggests that the human genome contains many small, overlooked ORFs that can become pathogenic when destabilized by repeat expansion. In these diseases, the mutation does not simply disrupt existing genes—it exposes hidden ones, revealing a new mechanism by which repeat expansions can produce toxic proteins and drive degeneration of muscle and brain.
