Scientists at the Icahn School of Medicine at Mount Sinai have developed a novel artificial intelligence (AI) tool that not only identifies disease-causing genetic mutations but also predicts the type of disease those mutations may trigger. The machine learning model, called V2P (Variant to Phenotype), is designed to accelerate genetic diagnostics and aid in the discovery of new treatments for complex and rare diseases.
In a newly published paper in Nature Communications, co-senior and co-corresponding authors Yuval Itan, PhD, Avner Schlessinger, PhD, and colleagues reported on the development and evaluation of V2P. The team showed that V2P’s predictions were able to identify pathogenic variants in real and simulated patient sequencing data, and at first comparison outperformed other tested methods.
“Our approach allows us to pinpoint the genetic changes that are most relevant to a patient’s condition, rather than sifting through thousands of possible variants,” said first author David Stein, PhD, who completed his doctoral training in the laboratories of Itan and Schlessinger. “By determining not only whether a variant is pathogenic but also the type of disease it is likely to cause, we can improve both the speed and accuracy of genetic interpretation and diagnostics.”
In their report, titled “Expanding the utility of variant effect predictions with phenotype-specific models,” the authors concluded, “V2P offers a complete mapping of human genetic variants to disease-phenotypes, offering a uniquely conditioned set of variant effect characterizations.”
Current genetic analysis tools can estimate whether a mutation is harmful, but they cannot determine the type of disease it might cause. “The increasing accessibility of high-throughput sequencing technologies has precipitated the proliferation of genetic data, including observed human sequence variants,” the authors wrote. But while significant efforts are being made to interpret such data, the large majority of variants still remain uncharacterized. “Current methods for variant effect prediction do not differentiate between pathogenic variants resulting in different disease outcomes and are restricted in application due to a focus on variants with a single molecular consequence.”
While substantial progress has been made over the past decades alongside continued generation of genetic data and more advanced detection methods, “several key limitations impede computational tools for variant assessment,” the investigators noted. Most methods can’t make interpretations across different types of genetic variation, for example, single-nucleotide variants (SNVs) and insertions/deletions (indels). Current methods also consider pathogenic variants as a homogeneous class, Stein and colleagues continued, “… and hence may underperform for certain genes or for variants with particular molecular mechanisms or disease presentations.”
V2P is designed to fill this gap, using advanced machine learning to link genetic variants with their likely phenotypic outcomes—that is, the diseases or traits a mutation might cause—effectively predicting how an individual’s DNA could influence their health.
To date, the investigators pointed out, most approaches for predicting the relationship between pathogenic genotype and phenotype work at the gene or protein level, and methods to design what they term pathogenicity prediction methods that focus on specific diseases or phenotypes tend to be limited to only a small number of phenotypes or diseases.
The authors describe V2P as a “multi-task, multi-output” machine learning model that can jointly predict variant pathogenicity and the broad phenotypic effect of SNVs and insertions/deletions (indels) throughout the human genome. “Contrary to methods solely estimating pathogenicity in general, V2P’s output comprises 24 values, each ranging between zero and one, indicating a given variant’s likelihood of being pathogenic or benign as well as the variant’s likelihood to result in one or more of the 23 first-level disease phenotypes from the phenotypic abnormality sub-ontology of the HPO [Human Phenotype Ontology],” they stated.
The tool was trained on a large database of both harmful and benign genetic variants, incorporating disease information to improve prediction accuracy. In tests using real, de-identified patient data, V2P often ranked the true disease-causing variant among the top 10 candidates, highlighting its potential to streamline genetic diagnostics. “Across the 23 examined phenotypes, V2P’s phenotype-specific scores yielded on average a 0.16 improvement in AP score compared to the next best method for the given phenotype and an average increase of 0.38 over the median AP score of the compared methods across phenotypes in the three evaluation datasets,” the scientists reported.
“V2P gives us a clearer window into how genetic changes translate into disease, which has important implications for both research and patient care,” added Itan, who is an associate professor of artificial intelligence and human health, and genetics and genomic sciences, a core member of the Charles Bronfman Institute for Personalized Medicine, and a member of the Mindich Child Health and Development Institute at the Icahn School of Medicine at Mount Sinai. “By connecting specific variants to the types of diseases they are most likely to cause, we can better prioritize which genes and pathways warrant deeper investigation. This helps us move more efficiently from understanding the biology to identifying potential therapeutic approaches and, ultimately, tailoring interventions to an individual’s specific genomic profile.”
In their paper, the team further concluded, “Together, these results indicate that V2P’s phenotype-specific approach may have utility for the identification of pathogenic variants in the context of their phenotypic effects. For the investigation of a particular phenotype or disease, V2P may offer a unique perspective on variant effect.”
Commented Schlessinger, a professor of pharmacological sciences and director of the AI Small Molecule Drug Discovery Center at the Icahn School of Medicine at Mount Sinai, “Beyond diagnostics, V2P could help researchers and drug developers identify the genes and pathways most closely linked to specific diseases. This can guide the development of therapies that are genetically tailored to the mechanisms of disease, particularly in rare and complex conditions.”
While V2P currently classifies mutations into broad categories such as nervous system disorders or cancers, the researchers aim to refine the tool to predict more specific disease outcomes and integrate it with additional data sources to support drug discovery.
This innovation represents a step toward precision medicine, in which treatments can be matched to a patient’s genetic profile. By connecting genetic variants to their likely disease effects, V2P may help clinicians diagnose more efficiently and help scientists identify new therapeutic targets, say the investigators. “We anticipate that the novel resources provided by V2P will allow for new insights into the relationship between pathogenic variants and their phenotypic outcomes during future investigations and as these data are explored in detail by the genetics community,” the team concluded.
