A next-generation, genome-wide virtual screening engine can rapidly identify potential targets for drug treatment at a previously unimagined scale and speed.
DrugCLIP has achieved the first genome-scale virtual screening for human targets, covering over 10,000 human protein targets with a 500 million compound library.
The system, outlined in Science, uses an AI contrastive deep learning framework to rapidly identify small-molecule ligands for every druggable target in the human genome.
The team behind it have made their genome-scale virtual, screening database freely available to researchers worldwide at drugclip.com, with one-click access and no coding required.
The method marks a milestone as virtual screening enters the ultra-high throughput area, according lead investigator Lei Liu, PhD, from Tsinghua University, and co-workers.
“From target to clinic, DrugCLIP is shortening the distance to ‘hope,’” the team maintained.
“This is where AI meets drug discovery, and the starting point of next-generation drug development. Faster, more precise, more accessible.”
AI is leading a revolution in drug discovery but the speed at which traditional screening is performed has become a bottleneck for progress.
Screening a billion compounds for a single target using traditional docking requires over two weeks even with 10,000 central processing unit cores. This makes it enormously time consuming to virtual screen numerous targets in the human genome.
To accelerate the process, Liu and team developed DrugCLIP as a ground-breaking, ultra-fast AI virtual screening engine powered by contrastive learning.
DrugCLIP encodes protein pockets and small molecules into a shared latent space using both large-scale synthetic data and experimentally determined protein-ligand complex structures.
It then uses dense retrieval to instantly identify potential active molecules, eliminating the need for one-to-one docking.
DrugCLIP outperformed traditional docking and machine learning methods even with noisy structures, novel targets, Apo structures and AlphaFold-predicted structures, while still delivering precision and efficacy.
The system was applied for around 10,000 human proteins against 500 million compounds, scoring more than 10 trillion protein-ligand pairs in under 24 hours using only eight graphics protein units.
The screen presented more than two million candidate molecules covering around 20,000 pockets, representing around half the human genome.
The researchers performed biological validation of their virtual screening system on the norepinephrine transporter, with 15% of results wet-lab validated as effective inhibitors and 12 compounds showing binding activity superior to the antidepressant bupropion.
The team concluded: “The integration of ultrafast virtual screening frameworks, such as DrugCLIP, with emerging structural modeling and affinity prediction technologies will enable deeper systematic drug discovery across the human genome.
“This convergence can offer a more precise map of the druggable genome and provide a foundation for accelerating future drug discovery efforts.”
