A preprint posted by researchers at artificial intelligence (AI)-based drug developer Insilico Medicine plots a two-pronged roadmap that aims to address two key questions in drug discovery: How can companies efficiently search the universe of more than 10⁶⁰ potential drug molecules for new treatments? And how can those companies prevent their discoveries from being the basis of “me too” and “me better” drugs patented by competitors?
In “Molecular LEGION: Latent Enumeration, Generation, Integration, Optimization and Navigation. A case study of incalculably large chemical space coverage around the NLRP3 target,” posted on ChemRxiv, a team of 11 Insilico researchers unveiled their AI-driven workflow designed to answer both questions, called LEGION—short for Latent Enumeration, Generation, Integration, Optimization, and Navigation.
The researchers applied LEGION to NLRP3, a protein strongly tied to inflammation in tissues throughout the body—and the target of Insilico’s oral NLRP3 inhibitor ISM8969, which is being developed to treat Parkinson’s disease and completed IND-enabling studies in August.
Insilico plans to submit an Investigational New Drug (IND) application later this quarter to begin clinical studies of ISM8969, which it envisions as a potentially best-in-class, brain-penetrant, and safe treatment for numerous other diseases in which NLRP3 has been implicated, including arthritis and heart disease.
Given the range of disease targets with high patient populations, Insilico is among those that envision NLRP3 becoming as potentially lucrative a target for new drugs as glucagon-like peptide 1 (GLP-1) receptor agonists. That is not apparent at present, based on forecasts showing the GLP-1 market projected to skyrocket from $47.4 billion in 2024 to $471.1 billion by 2031 (MarketsandMarkets), a compound annual growth rate (CAGR) of 33.2%—while the NLRP3 market is expected to multiply but only from $1.12 billion in 2024 to $5.43 billion in 2033 (DataIntelo), a CAGR of 18.3%.
Covering the universe
Insilico says LEGION is designed to cover the drug molecule universe fully enough to prevent rivals from using it to patent their own knockoffs, as well as expand the reach of generative chemistry tools to disclose, then defend, vast regions of that chemical space.
“The ‘play’ is both generating new, AI-designed leads and deliberately disclosing large, related families to harden IP [intellectual property] around novel series,” Alex Zhavoronkov, PhD, the preprint’s co-corresponding author and Insilico’s founder, chairman, executive director, and CEO.
Insilico researchers applied LEGION to generate more than 123 billion new molecular structures in a proof-of-concept test, in the process uncovering tens of thousands of “scaffold” core molecular structures deemed to be promising in hours rather than months.
That outcome, according to the company, shows AI’s ability to dramatically shorten drug discovery time, a key promise of the technology. Another key promise, a resulting savings of money, was not quantified in the preprint, as researchers opted instead to quantify impact via runtime, library scale, and virtual hit-rate improvement.
“We worked from a sampled, tractable subset, ran them through our 2D/3D filters and pharmacophore-aware screening in Chemistry42, and advanced only the highest-value series to medicinal chemists for prioritization and potential synthesis,” Zhavoronkov said. Chemistry42 is Insilico’s generative chemistry engine.
“LEGION strengthens IP positions and reduces fast follower risk, an important cost driver downstream,” he added.
Insilico open sourced a subset of 120+ million molecules designed to target NLRP3. The public disclosure is intended to make regions of chemical space far harder to patent.
“Taking the 120 million NLRP3-related molecules public doesn’t make NLRP3 unpatentable, but it makes those disclosed structures and large neighborhoods around them far, far harder for fast followers to claim as new IP, because the space is now so widely publicly mapped and defended,” Zhavoronkov said.
“The strategy is to use LEGION to maximize scaffold diversity, generate large families around each scaffold, and disclose at scale so competitors can no longer claim that ground, closing off the usual scaffold-hopping routes,” he explained. “In practice, another company could still patent genuinely distinct chemotypes outside the disclosed regions, but our approach is to cover chemical space so comprehensively around the target that typical ‘me-too/me-better’ variants become unpatentable or at least much harder to patent.”
Identifying scaffolds
Researchers applied LEGION to identify more than 34,000 unique scaffolds found to have potential to bind NLRP3, through a combination of generative AI tools for designing new molecules and AI-based screening of the massive databases of previously identified molecules.
“These scaffolds include key pharmacophore functional groups essential for interacting with
NLRP3, effectively occupying the binding pocket, and providing optimal vectors for chemical
substitutions,” the researchers reported. “This approach ensures that the generated compounds exhibit both binding relevance and structural validity, bridging the gap between 2D generative models and meaningful 3D chemical space exploration.”
The number of scaffolds grew to nearly 94,000 final scaffolds after the investigators systematically replaced attachment points on the complex scaffolds with common drug side chains, limiting the number of free attachment points in a scaffold-simplifying step.
Those roughly 94,000 scaffolds were fed into Chemistry42 to add to and modify the makeup of the structures and their side chains, yielding 6.5 million virtual compounds. Researchers also subjected a subset of the scaffolds with two attachment points to a mixing-and-matching step or “combinatorial explosion” that resulted in more than an additional 100 million 2D molecular structures.
By creating a random sample of the 123 billion total structures generated with combinatorial explosion of about 12,000 scaffolds, scientists had a feasible number of structures for their other analytical tools to handle—although more of the total generated volume could be used for follow-up as more powerful computational systems are developed in the future.
“When we rescreened representative libraries, ~60% of 2D-generated molecules extrapolated to 3D virtual hits versus 8–26% for the combinatorial expansion. This is evidence that LEGION prioritizes synthesis-worthy ideas and reduces low-value cycles,” Zhavoronkov commented.
Researchers acknowledged limitations that included the critical role for human medicinal chemists in scoring, prioritizing, and testing the molecules proposed via AI. Also, LEGION is heavy reliant on structural data about the target protein, such as 3D crystal structures and known ligand interactions. Coverage would be less extensive for targets without deep structural information.
“To push beyond that constraint, LEGION was designed to be extensible since Chemistry42 lets us train and plug in target-specific predictive models and expand input libraries, and the Reward/validation components can be upgraded with more physics-based scoring as it becomes tractable,” Zhavoronkov said. “Pharmacophore scoring continues to act as a guardrail against irrelevant enumeration.”
“We haven’t announced a branded “LEGION 2.0” but the paper lays out this upgrade path which we’re applying as we move forward,” he added.