Ginkgo Bioworks has announced the launch of the Virtual Cell Pharmacology Initiative (VCPI) through Ginkgo Datapoints. This open-source platform is designed to build a standardized framework for virtual cell modeling in drug discovery by bringing together researchers, pharmaceutical companies and AI developers in a community-driven effort to create the largest public dataset of its kind, aiming to test at least 100,000 compounds and generate over 12 billion data points.
While virtual cell models have emerged as a potentially valuable tool for drug discovery, the lack of standardization, reliable wet lab methods and appropriate pharmacology data prevents these models from reliably predicting how drugs will impact cells. In addition to VCPI, the Arc Institute announced the inaugural Virtual Cell Challenge earlier this year to address this gap.
“To develop a drug, you need measurements on drug-like molecules. VCPI focuses on generating exactly that: high-quality pharmacology data on a standardized cell line that the entire research community can build upon,” said John Androsavich, PhD, general manager of Ginkgo Datapoints. “We’re not just generating data. We’re creating the standard for how this field should develop.”
Unlike other virtual cell initiatives, which release finished datasets, VCPI will allow contributors to participate before data creation by offering high-throughput RNA profiling via Ginkgo Datapoints, free of charge.
Androsavich says the AI and biology communities has not always aligned on what makes good training data. “Current approaches rely on large quantities of low-quality data. It’s like empty calories—lots of data, but it’s noisy and may not be reproducible. We believe VCPI will prove that quality and quantity can co-exist. We’re offering both, with a method specifically designed for the pharmacology applications that matter most to drug developers,” he said.
VCPI tackles the two big challenges for developing predictive virtual cell models. The first challenge is the lack of a defined “cell” in a virtual cell. As a solution, Gingko Datapoints presents V-Ref293 as a novel engineered cell line designated specifically as the reference standard for virtual cell research. Master cell bank vials will be made available to the community in 2026, ensuring labs worldwide can generate comparable results.
Second, scalable wet lab methods lack high signal-to-noise. Ginkgo Datapoints presents DRUG-seq as a pharma-validated high-throughput RNA bulk sequencing method as an alternative data for drug screening than single-cell approaches.
The initiative welcomes open participation in which researchers and companies can contribute compounds for free testing, with data released, on a rolling basis, to the public domain under Creative Commons (CC BY 4.0). Participants may elect to contribute compounds under terms that include a period of exclusive data access prior to public release, or to reserve data indefinitely for their own use.
To encourage participation, contributors can vote on prioritization, share models, take part in future competitions and engage in a community discussion forum. Active contributors can achieve “super user” status and gain early data access.
