Deciphering gene circuits can be tedious and immensely time consuming. Modifying, or designing gene circuits from previously identified pathways presents further challenges.
“There are many possible designs for any given function, and finding the right one can be like looking for a needle in a haystack,” said Caleb Bashor, PhD, a scientist at Rice University and lead author of a new study that establishes a new technique aimed at finding useful gene circuits, or DNA designs, much more quickly than researchers have historically been able to.
The study was published in Nature in a paper entitled, “Ultra-high throughput mapping of genetic design space.”
“We created a new technique that makes hundreds of thousands to millions of DNA designs all at once—more than ever before,” Bashor said.
Researchers in the Bashor lab, which included co-first authors Kshitij Rai, PhD, and Ronan O’Connell, PhD, then graduate students, and interdisciplinary collaborators including physicists and computer scientists, developed a novel technique called CLASSIC, short for: combining long- and short-range sequencing to investigate genetic complexity. With CLASSIC, the team used AI and machine learning (ML) to design these circuits.
“Our work is the first demonstration you can use AI for designing these circuits,” said Bashor.
Using both long-read and short-read next generation sequencing (NGS), the team was able to create detailed maps of long stretches of DNA sequences. The long-read sequencing resulted in a large amount of DNA data, but this process was slow and the results could be noisy. Meanwhile, concurrent short-read sequencing reduced the noise, and sharpened the sequences.
“Most people do one or the other, but we found using both together unlocked our ability to build and test the libraries,” said O’Connell.
“We invented a way to do this in large batches, which allowed us to make really large sets—known as ‘libraries’—of circuits,” Rai added.
The team created a library of proof-of-concept gene circuits incorporating fluorescent reporter genes into the sequence. They produced a complete sequence of the circuit and tagged it with a short DNA barcode.
These gene circuits were inserted into human embryonic kidney cells. These cells were then analyzed for levels of fluorescence. Individual cells were sorted by brightness of the fluorescence and tested with short-read NGS to scan the DNA barcodes. This resulted in a master map, linking the genotypes with phenotypes for reporter expression.
“We end up with measurements for a lot of the possible designs but not all of them, and that is where building the ML model comes in,” O’Connell said.
This data was then used to train a model that would make predictions that were not in the original dataset. O’Connell explained that these predictions were then tested in follow up experiments. “We have all of these predictions—let’s see if they’re correct.”
Follow up experiments and manual checks of small random data sets suggested that CLASSIC was working well.
“We started lining them up, and first one worked, then another … and then they just started hitting,” Rai said. “All 40 of them matched perfectly. That’s when we knew we had something.”
“This was the first time AI/ML could be used to analyze circuits and make accurate predictions for untested ones, because up to this point, nobody could build libraries as large as ours,” he continued.
CLASSIC is able to analyze large datasets, develop detailed circuits, and make accurate predictions of native gene circuits, suggesting that this model will be useful for developing novel gene circuit designs. Using CLASSIC, the team realized that circuits are variable, having multiple pathways to elicit the same outcome.
“This is akin to navigation apps: There are multiple routes to reach your destination, some highways, some backroads, but all get you to your destination,” O’Connell said.
This technique is ripe for use with a combination of high-throughput circuit characterization and AI-driven development in the field of synthetic biology and biotechnology.
“We think AI/ML-driven design is the future of synthetic biology,” Bashor said. “As we collect more data using CLASSIC, we can train more complex models to make predictions for how to design even more sophisticated and useful cellular biotechnology.”
James Collins, DPhil, of the Massachusetts Institute of Technology, an early researcher and founder of synthetic biology concurs. “Twenty-five years ago, those early circuits showed that we could program living cells, but they were built one at a time, each requiring months of tuning,” Though he was not involved in the study, he commented that that this research has “delivered a transformative leap.”
“CLASSIC brings high-throughput engineering to gene circuit design, allowing exploration of combinatorial spaces that were previously out of reach. Their platform doesn’t just accelerate the design-build-test-learn cycle; it redefines its scale, marking a new era of data-driven synthetic biology,” he said.
Another prominent synthetic biology researcher and 2007 MacArthur Fellow, Michael Elowitz, PhD, said that, “synthetic biologists have dreamed of programming cells by snapping together biological circuits from interacting genes and proteins.”
He noted that understanding and developing highly detailed biological components and circuits has been a challenge in the field, but shared that this work “demonstrates how we can systematically explore biological design space and makes biological engineering more predictable. In the future, it will be exciting to generalize this approach to other interactions and components, bringing us closer to making cells fully programmable.”
