Researchers at the European Molecular Biology Laboratory (EMBL) in Heidelberg, Germany, in collaboration with the German Cancer Research Centre (DKFZ) and the University of Copenhagen in Denmark, have developed an AI model that uses large-scale health records to understand how human health develops over time and predict the risk of over 1,000 diseases developing within the next two decades.
“Our AI model is a proof of concept, showing that it’s possible for AI to learn many of our long-term health patterns and use this information to generate meaningful predictions,” said Ewan Birney, PhD, interim executive director at the European Molecular Biology Laboratory.
“By modeling how illnesses develop over time, we can start to explore when certain risks emerge and how best to plan early interventions. It’s a big step toward more personalized and preventive approaches to healthcare.”
In the study, published in Nature and titled “Learning the Natural History of Human Disease with Generative Transformers,” the researchers trained an AI model called Delphi-2M on anonymized patient data from 400,000 participants in the UK Biobank.
The AI model trained on this health data and learned to model the medical histories of people as sequences of events that unfold over time. These events include different medical diagnoses based on the ICD-10 top-level diagnostic codes, personal information, including sex and body mass, as well as lifestyle factors, such as smoking and alcohol consumption, and subsequent death.
Based on these previous health diagnoses, lifestyle factors, and further informative data, the researchers’ AI model can learn lifetime health trajectories and accurately predict the occurrence of future conditions for more than 1,000 diseases, including different cancers, diabetes, heart attacks, and death.
“Medical events often follow predictable patterns,” explained Tom Fitzgerald, co-author of the study and staff scientist at the EMBL’s European Bioinformatics Institute in Hinxton, U.K.
“Our AI model learns those patterns and can forecast future health outcomes. It gives us a way to explore what might happen based on a person’s medical history and other key factors. Crucially, this is not a certainty, but an estimate of the potential risks.”
By testing their AI model in two completely separate healthcare systems—the UK Biobank and on data from 1.9 million patients in the Danish National Patient Registry—the researchers were able to validate that it worked.
Moreover, they found that their model had a greater success in predicting conditions with clear and consistent progression patterns, such as specific cancers, heart attacks, and a type of blood poisoning called septicemia. Conditions, such as mental health disorders or pregnancy-related complications, were more difficult to predict for the AI model, as they are more variable and influenced by unpredictable life events.
In their study, the researchers emphasized that their AI model can only provide probabilities, so the risk or chance of a certain disease occurring over a given period of time, rather than certainties.
While the AI model shows great promise, especially to unburden healthcare systems and enable physicians to identify high-risk patients one day, it also comes with limitations.
For example, the data the AI trained on came from people aged between 40 and 60, which means that health events in younger individuals are underrepresented. The same goes for demographic biases, where the training data does not include certain ethnic groups.
Even though the AI model is not yet ready for clinical use, it can already be used to understand disease development and progression over time, how lifestyle factors and medical diagnoses affect long-term disease risk, and it can be used to simulate health outcomes when real-world data are difficult to access.
“This is the beginning of a new way to understand human health and disease progression,” said co-author Moritz Gerstung, PhD, head of the division of AI in oncology at DKFZ and a professor at the University of Heidelberg.
“Generative models such as ours could one day help personalize care and anticipate healthcare needs at scale. By learning from large populations, these models offer a powerful lens into how diseases unfold, and could eventually support earlier, more tailored interventions.”