What if we could simulate the cell? The basic unit of life, baring the intricate molecular networks that keep us alive and inviting scientists to endlessly unravel its secrets. Some see the virtual cell as the holy grail of biology research, while others argue it is outright impossible to ever come near simulating the full constellation of molecular interactions a cell harbors.
Scientists have toyed with the concept of a virtual cell for decades. However, it is only recently that advances in biology and artificial intelligence have made it possible for them to take their first steps towards simulating the cell.
Once—and if—we get there, this technology could redefine precision medicine.

Core Investigator, Arc Institute
“Virtual cells could dramatically accelerate drug discovery by predicting which genetic or chemical perturbations shift diseased cells back to healthy states, enabling computational screening at a fraction of the cost and time of physical experiments. They could also enable personalized treatment strategies by predicting how cells with specific genetic backgrounds respond to different therapies,” said Hani Goodarzi, PhD, core investigator at the Arc Institute and associate professor at the University of California, San Francisco.
As part of his research work, Goodarzi is helping to build the Arc Virtual Cell Atlas, a publicly available collection of datasets specifically designed and curated for the development of virtual cell models. With single-cell data collected from over 300 million cells and growing, an essential element of these datasets is perturbation data collected before and after a change occurs in the cell.
“Perturbational data is absolutely critical because it provides causal information, showing not just correlations but actual cause-and-effect relationships when genes are knocked out or drugs are added,” said Goodarzi. “This is what enables models to make predictions about novel perturbations rather than just describing observed patterns.”
Virtual cell models could allow scientists to run millions of experiments in parallel to predict the effects of not just novel perturbations but also combinations of them—something that would be impossible by relying solely on wet lab experiments, even with the most advanced techniques available today. The challenge remains turning this concept into a reality.
How to build a virtual cell
Where does one even start making a virtual model of a cell? Jan Ellenberg, PhD, director of the Science for Life Lab (SciLifeLab) in Sweden, is facing this question as he prepares for the launch of Alpha Cell, a research program slated to start in early 2026 dedicated to predicting cell behavior with machine learning models.

Director, SciLifeLab
“It is currently computationally impossible to make an atomic-scale, dynamic model of a human cell,” said Ellenberg. “No computer in the world can run that calculation.” Instead, artificial intelligence can be leveraged to make predictions, based on experimental data, about which elements of the elaborate molecular networks in the cell are relevant for specific biological functions or diseases.
He envisions the virtual cell as a model that can simulate the cell at different scales and levels of detail. By running simple, large-scale simulations first, the model can identify opportunities for finer, more precise virtual experiments. This iterative process could spot key data gaps, guiding experimental design and aiding the discovery of new therapeutic and diagnostic targets.
“We have seen some spectacular successes using gene therapy in rare diseases, where it’s often only one protein that malfunctions or one mutation that has to be fixed,” said Ellenberg. However, most common diseases are complex and treating them effectively would require a much more comprehensive understanding of the underlying biological processes.
To that end, researchers at the SciLifeLab will be using cutting-edge microscopy methods to create detailed maps that outline where different molecules are located within the cell, how they interact with one another, and how changes in these molecules are linked to different cell states and functions. Early candidates for these simulations will be cancer and stem cells. Pinpointing events that lead to cancer could revolutionize how early the disease can be detected and treated, while studying stem cells could provide invaluable information about the fundamental transitions cells undergo to carry out different biological functions.
This knowledge could prove instrumental in addressing a significant challenge for medicine. By the time symptoms develop and a disease is diagnosed, it is often too late to fully revert the process. “Any state of disease arises from cellular malfunctions,” said Ellenberg. “If we can understand the early changes that set cells onto this trajectory, we could intercept and interfere [with disease] much earlier than we can at the moment.”

Associate Professor,Â
Westlake University
For Tiannan Guo, PhD, an associate professor at Westlake University in China, the path to simulating the human cell starts with tackling a simpler organism first. His lab has done extensive work modeling the yeast Saccharomyces cerevisiae, a single-cell organism used since ancient times to make wine, beer, and bread. With a relatively small genome and abundant perturbation data available, this unassuming yeast could be the ideal candidate for early proofs of concept that can guide the creation of more complex, human cell models down the line.
“To build a virtual cell, we need three layers of paired data,” said Guo. The first layer concerns identifying all the components found inside a cell; the second, mapping how these molecules are spatially organized within the cell; and the third, tracking how each piece of the puzzle changes over time as a response to both internal and external signals.
Following this approach, Guo and colleagues are producing vast datasets of perturbation proteomics data to feed into virtual cell models. Venturing into unexplored grounds, they have taken on challenges like the development of new techniques that can increase the resolution of spatial proteomics methods to precisely locate proteins within the cell.
On the road from simulating yeast to human cells, immortalized cancer cell lines could represent an important stepping stone to unlock early applications of virtual cells in drug discovery. One of the models developed by the team, called ProteinTalks, was trained on over 38 million individual protein measurements from triple-negative breast cancer cells, allowing it to predict drug efficacy and synergy effects that were then validated in patient-derived tumor xenograft models.
As technology evolves and these models continue to grow, Guo expects to see virtual cells bringing together additional data modalities such as transcriptomics, imaging, and even any relevant literature published over the years: “All the knowledge we have about how a cell works should eventually be integrated [in these models] to make more comprehensive predictions.”
The data bottleneck
Obtaining and analyzing vast amounts of high-quality data will be the number one challenge facing scientists striving to simulate the cell. Despite significant advances in spatial proteomics and imaging methods, taking longitudinal measurements of multiple molecules within a cell remains a technical challenge. For many in the field, developing high-throughput techniques that address this limitation will be a major research goal in coming years.
“A lot of the fundamental data that a model will need to make molecular predictions of the human cell is not yet available,” said Ellenberg. “That’s what we need to invest in.”
Virtual cell models will also face hard computational barriers as the number of simulated molecules and network interactions continue to grow. “We will be operating pretty much at the cutting edge of what computational technology can deliver right now,” said Ellenberg.
“We need more high-quality perturbational data showing cellular responses across diverse contexts, with better reproducibility and less technical noise,” Goodarzi concurred. “We also need rigorous benchmarks to evaluate how well these models actually predict cellular behavior.” With this goal in mind, the Arc Institute recently launched the Virtual Cell Challenge, an annual open-source competition aimed at creating benchmark datasets that researchers worldwide can use to assess the quality of virtual cell models.
Goodarzi sees the evolution of virtual cells following in the footsteps of other deep learning models such as AlphaFold, now widely used in biology research to predict protein structure. However, although the algorithms behind AlphaFold are based on decades of crystallography and sequencing information, the data necessary to simulate cells and predict their behavior is still largely missing.
Ron Alfa, MD, PhD, CEO of AI-native biotechnology company Noetik, is determined to close this data gap. Three years ago, he founded Noetik with the ambitious goal of developing AI models that simulate the cell within the context of their native tissue. These models are designed to account for the complex relationships between tumor cells, immune cells, and the tumor microenvironment, all of which play an essential role in the success of cancer treatments such as immunotherapies.

CEO, Noetik
By sourcing biopsy samples from cancer patients, Noetik scientists are generating paired datasets that collect spatial information from tissue staining, proteomics, and transcriptomics data for each sample. By leveraging the same kind of data that are already used in clinical workflows, Alfa believes these virtual cell models can make more accurate and actionable predictions. “You can’t really get from one cell to patients, it’s a very big leap,” he said. “We want to simulate the cell at a level that helps us develop drugs, stratify patients, and identify biomarkers.”
Putting together these datasets can be expensive and time-consuming, but the resulting models could take precision medicine to new frontiers. “We’ve never before been able to take human tissue and simulate its biology,” said Alfa. “We can now run simulations against all of our data and build a profile of each patient to understand how they are different from each other.”
Through a partnership with Agenus, Noetik is already applying these models to improve patient selection for oncology clinical trials. The team will be leveraging data from nearly 200 million cells collected from thousands of cancer patients to identify relevant, actionable biomarkers to predict which patients are most likely to respond to a clinical-stage immunotherapy being developed by Agenus.
Alfa believes these models could pave the way for researchers to start uncovering the reasons why some drugs work well for some patients but not for others—a problem that has caused many clinical trials to fail and continues to drain time and resources across the pharma industry, especially within the oncology field. “If we can predict ahead of time which patients are going to benefit from each drug, we can enroll them in the right clinical trial and give each patient the right drug for their tumor,” said Alfa.
Towards a virtual future
Regardless of whether we can ever fully simulate the human cell, virtual cells are making an impact in biology research, drug discovery, and patient stratification. “This is not a holy grail we’re pursuing,” said Alfa. “We are doing the work right now and seeing results already.”
While the virtual cell models emerging today are relatively small in scale, he expects to see them rapidly evolve to become larger and more powerful, reshaping workflows across the pharmaceutical industry and increasingly replacing wet lab work with virtual experiments. As technological barriers get knocked down and costs continue to drop, players across the field will be able to generate data at a much larger scale to feed into these models.
Further down the line, Alfa sees the models playing an essential role in a patient’s clinical journey. Cancer patients, for instance, would get a sample processed by the model to make personalized treatment decisions that adapt over time as the tumor evolves and either responds or becomes resistant to treatment. He added: “Ultimately, we want to continue running this process so that as we treat patients, we can follow the evolution of the tumor and make sure we are delivering the right drug at each point in time.”
As we venture into the future, Guo expects robotics to play an important role in the development of virtual cell models. Combining the predictive power of virtual cells with robotic experimentation could lead to the creation of auto-evolving, self-optimizing systems that can independently interrogate biology, learn from the results, and identify the next experiment that needs to be done to close relevant knowledge gaps. How fast we can get there, however, will largely depend on the resources the scientific community invests into this budding technology over the next decade.
It can be difficult to conceive what the future of virtual cells will look like even a few years from now. After all, few would have predicted just a decade ago that large language models would become so broadly adopted. Given the accelerating pace of technological development in both artificial intelligence and biology, what seems certain is that the next decade will bring massive—and potentially unexpected—advances for virtual cells.
To take that leap, Ellenberg believes that international collaborations will be essential. The scientific community will have to collectively get closer to the overarching goal of simulating the cell as no single project will be able to address all the challenges and potential use cases.
“Making the cell predictable will be a fundamental revolution for life science and medicine,” said Ellenberg. “This is not science fiction.”
Â
Clara RodrĂguez Fernández is a science journalist specializing in biotechnology, medicine, deeptech, and startup innovation. She previously worked as a reporter at Sifted and editor at Labiotech, and she holds an MRes degree in bioengineering from Imperial College London.
