Researchers from Integra Therapeutics, in partnership with the Pompeu Fabra University (UPF) and the Center for Genomic Regulation (CRG), Spain, have used generative AI to design synthetic proteins that outperform naturally occurring proteins used for editing the human genome. Their use of generative AI focused on PiggyBac transposases, naturally occurring enzymes that have long been used for gene delivery and genetic engineering, and uncovered more than 13,000 previously unidentified PiggyBac sequences. The research, published in Nature Biotechnology, has the potential to improve current gene editing tools for the creation of CAR T and gene therapies.
“Our work expands the phylogenetic tree of PiggyBac transposons by two orders of magnitude, unveiling a previously unexplored diversity within this family of mobile genetic elements,” the researchers wrote.
For their work, the researchers first conducted extensive computational bioprospecting, screening more than 31,000 eukaryotic genomes to uncover the 13,000 new sequences. From this number, the team was able to validate 10 active transposases, two of which showed similar activity to PiggyBac transposases currently used in both research and clinical settings.
PiggyBac transposases are mobile genetic elements that were originally found in the genomes of insects and other species. These enzymes’ genetic editing power comes from their recognition of specific TTAA DNA sequences, where they insert a PiggyBac transposon—the DNA payload that edits the genome. Their ability to carry large DNA payloads makes them attractive for gene therapy applications. While they are a powerful gene editing tool, naturally occurring PiggyBac enzymes can show limited precision and sequence diversity, restricting their use in some therapeutic and research contexts.
To potentially overcome this, the researchers turned to a protein large language model (pLLM) known as ProGen2. Fine-tuned with the 13,000 novel PiggyBac sequences the team had uncovered, the model generated new synthetic protein sequences designed to follow the biochemical and structural principles of natural proteins.
“For the first time, we have used generative AI to create synthetic parts and expand nature. Like the cognitive power of ChatGPT can be used to write a poem, we have used the protein-based large language models to generate new elements that comply with the physical and chemical principles of genes,” said Marc Güell, PhD, an ICREA research professor working at Pompeu Fabra University, and scientific director at Integra Therapeutics.
The researchers experimentally tested 22 synthetic variants, each differing by up to 54 amino acids from the commonly used naturally occurring HyPB (hyperactive PiggyBac) transposase. Seven of the variants tested showed higher excision activity than HyPB and one sequence, named “Mega-PiggyBac,” showed significantly improved performance in both excision and targeted integration of DNA.
In addition, the new AI-designed synthetic transposases were shown to be compatible with the FiCAT (Find and Cut-and-Transfer) platform, a gene editing system that uses a Cas9 enzyme fused to a PiggyBac transposase to insert genes at targeted locations. One synthetic sequence doubled the integration efficiency of FiCAT, highlighting its potential for precise genome engineering applications.
The teams’ work was informed by previous research that showed the utility of using large language models to engineer CRISPR-Cas9 variants and catalytic enzymes. However, the application of these methods to PiggyBac systems breaks new ground. Based on their findings using this technique, the research team touted the significance of the bioprospecting they conducted. “This approach not only expands the PiggyBac toolkit but also provides a valuable framework for the development of additional gene modification tools for precise and efficient genome manipulation applicable across biotechnology and therapeutic fields,” they wrote.
The team’s structural analysis also uncovered new types of DNA-binding domains and fusion architectures among PiggyBac sequences. Using AlphaFold3, they identified two distinct CRD zinc-finger motifs (HC6H and C5HC2), which influence the way transposases bind and integrate DNA. This domain-level variation suggests that PiggyBac proteins have evolved diverse mechanisms of genome integration, which could be exploited for a variety of clinical use cases.
Looking ahead, the researchers plan to explore how AI-optimized transposases perform in animal models and clinical-grade settings. Further investigation will focus on specificity, which is vital for gene therapies to avoid off-target effects.