For decades, scientists have relied on structure to understand protein function. Tools like AlphaFold have revolutionized how researchers predict and design folded proteins, allowing for new therapeutics and enzymes to be modeled in silico. But what happens when proteins don’t fold at all? Nearly 30% of the human proteome consists of shapeshifting, intrinsically disordered proteins (IDPs) that refuse to settle into a stable structure—and have remained beyond the reach of AI-based prediction tools.
Now, researchers at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) and Northwestern University have developed a new way to bring order to this biological chaos. Their physics-based machine learning framework can design IDPs with custom properties, paving the way for deeper understanding of their biological roles and potential therapeutic use. The study, titled “Generalized design of sequence–ensemble–function relationships for intrinsically disordered proteins,” was published in Nature Computational Science.
Unlike traditional protein design methods that depend on 3D structural templates, this new approach embraces disorder. “We needed to either come up with better AI models, or, we needed to come up with a way to actually take those physics models where you not only get good predictions, but you also get the physics for free,” said Krishna Shrinivas, PhD, senior author and assistant professor of chemical and biological engineering at Northwestern.
At the heart of the method is automatic differentiation, a mathematical technique borrowed from deep learning that allows computers to compute derivatives automatically. Applied here, the algorithm performs gradient-based optimization on molecular dynamics simulations to identify amino acid sequences that exhibit desired ensemble behaviors.
The researchers compare the process to a powerful search engine for amino acid sequences. By iteratively simulating and adjusting sequences, the framework learns how even single amino acid changes affect a protein’s overall behavior without relying on vast experimental datasets. “We didn’t want to have to take a bunch of data and train a machine learning model to design proteins,” said Ryan Krueger, a SEAS graduate student and co-lead author. “We wanted to leverage existing, sufficiently accurate simulations to be able to design proteins at the level of those simulations.”
Using this framework, the team successfully designed disordered proteins that act as molecular loops, linkers, and environmental sensors, capable of responding to salt, temperature, or phosphorylation changes. These behaviors mirror how IDPs function in cells—serving as flexible connectors or switches that regulate signaling and assembly.
By fusing machine learning with physical simulations, the method sidesteps the limitations of purely data-driven models, offering a generalizable framework for designing proteins that defy structural conventions. As Shrinivas and colleagues noted, future work may extend this approach to other complex biomolecules, including RNA and DNA, where flexibility drives biological function.