Combining a single-gene diagnostic biomarker with a large language model (LLM) analysis of electronic medical records can substantially improve the diagnosis of lower respiratory tract infections (LRTI) in critically ill adults, according to researchers at UC San Francisco. Their observational study, published in Nature Communications, described how this integrated approach more accurately distinguished infectious from non-infectious causes of respiratory failure than either method alone or than those provided by intensive care unit clinicians at the time of admission. In an independent validation cohort, the model made a correct diagnosis 96% of the time, which could have reduced inappropriate prescribing of antibiotics among the study group by 80% had it been available when the patients were admitted.
“We’ve devised a method that gives results much faster than a culture, and it could be easy to implement in the clinic,” said Chaz Langelier, MD, PhD, an associate professor of medicine at UCSF and senior author of the study. “We’re confident that it could lead to faster diagnosis and curtail the unnecessary use of antibiotics.”
The speed and accuracy of diagnosis for patients admitted to the hospital is important, particularly those who are admitted to the intensive care unit. “Lower respiratory tract infections (LRTI) are a leading cause of mortality and are challenging to diagnose in critically ill patients, as non-infectious causes of respiratory failure can present with similar clinical features,” the researchers wrote. The result of an uncertain diagnosis of whether a patient has an infection is the overprescription of antibiotics, which can lead to adverse effects such as Clostridioides difficile infection or antimicrobial resistance.
To find a way to address this uncertainty, UCSF team addressed integrated two tools that capture different aspects of LRTI. One is a pulmonary transcriptomic biomarker based on the expression of FABP4, a gene involved in inflammatory signaling. The other is the use of the Generative Pre-trained Transformer 4 (GPT-4) LLM, which analyzes clinical notes from the electronic medical record, including admission notes and chest x-ray reports.
FABP4 was identified as an LRTI biomarker by the UCSF researchers in 2023 in patients with acute respiratory failure. Prior studies showed its utility primarily for identifying infection in both adults and children, and across cohorts dominated by bacterial or viral infections. This was significant because it suggested that FABP4 is agnostic to pathogen type. Mechanistically, FABP4 is highly expressed in alveolar macrophages that are depleted during respiratory infections, including bacterial pneumonia and COVID-19. While FABP4 outperformed commonly used clinical biomarkers such as C-reactive protein and procalcitonin, the researchers noted that it alone did not reach the accuracy needed to guide antimicrobial decisions in critically ill patients, a result that led the team to develop the combined approach with the LLM.
In their new work, the investigators analyzed data from two groups of ICU patients with acute respiratory failure. The derivation cohort included 98 patients enrolled before the COVID-19 pandemic, most with bacterial infections. The validation cohort included 59 patients enrolled during the pandemic, most with viral infections. Each diagnostic method alone—FABP4 expression or GPT-4 analysis—correctly classified about 80% of cases. The combined classifier achieved an area under the receiver operating characteristic curve of 0.93 in the derivation cohort and 0.98 in the validation cohort, with diagnostic accuracy of 84% and 96%, respectively. This compared favorably with the accuracy of the admission treating medical team of 72%.
To better understand how the LLM performed, the team compared GPT-4’s analysis of medical records with reviews by physicians specializing in internal medicine and infectious diseases. Both GPT-4 and the physicians had similar levels of accuracy, but the AI placed more emphasis on radiology reports, while physicians focused more on the clinical notes. “It was almost showing a cultural difference, if you can say that about an AI,” said first author Natasha Spottiswoode, MD, an infectious disease physician-scientist at UCSF. “It shows how AI can complement the work physicians do.”
The researchers noted that their method was developed with easy-to-use adoption for other health systems. For this reason, the AI prompts used are published in the study and said they could be run on HIPAA-compliant platforms without specialized computational training. “Using this is unbelievably simple, you don’t have to be a bioinformatician,” said Hoang Van Phan, PhD, another first author.
Next steps include validating the combined biomarker and AI approach as a clinical test in larger cohorts and in prospective studies. Based on the success of this research, the team will now turn their attention to developing a similar tool for sepsis, the number one cause of hospital deaths, and a condition that is notoriously difficult to treat.
