An artificial intelligence (AI)-based large language model (LLM) was able to match clinician suggestions for simple, early stage hepatocellular carcinoma but was less able to match clinical recommendations for more complex, later stage cases.
As reported in PLOS Medicine, for the earlier stage cases the treatment suggested by the LLM was linked to better survival, but for late stage cases the closer the treatment matched the AI suggestion the poorer the patient’s outcome was.
“Liver cancer… is common worldwide, and choosing the right treatment can be difficult because it depends on both the cancer stage and how well the liver is functioning,” write lead author Ji Won Han, a researcher based at the Catholic University of Korea, and colleagues.
“Although international guidelines provide recommendations, real-world treatment often varies because doctors tailor decisions to each patient’s situation. LLMs such as ChatGPT, Gemini, and Claude can summarize medical information, but it is not known whether their treatment advice would match what doctors actually do in practice.”
LLM’s are increasingly being used to aid treatment decisions in hospitals and clinics so Han and colleagues decided to analyze how effective the treatment suggestions provided by LLM’s were for treating patients with a range of different stages of hepatocellular carcinoma.
Overall, 13,614 previously untreated cases of hepatocellular carcinoma diagnosed between 2008 and 2020 in Korea were included in the study. Treatment recommendations from the LLM’s ChatGPT 4o, Gemini 2.0, and Claude 3.5 were generated using standardized prompts based on guidance from the American Association for the Study of Liver Diseases and the European Association for the Study of the Liver. The researchers then classified the LLM recommendations as ‘matched’ when the patients received the treatments the AI suggested.
The team compared overall survival between matched and mismatched patients. Using the three models, the LLM and physician-recommended treatments matched 26–33% of the time.
In early stage liver cancer patients, matching status between treatment and LLM recommendation was linked to higher survival, but in advanced stage patients matching status was linked to lower survival.
In an analysis of the decision making of both the LLM’s and the physicians, the researchers found that physicians looked more at parameters of liver function, whereas the LLMs focused more on tumor features.
In early-stage liver cancer, physicians often withheld curative treatments when hepatic reserve was poor, whereas in advanced-stage disease they tended to use liver‑directed procedures in patients with good liver function, even though guidelines recommended systemic therapy instead.
“LLMs may help support straightforward treatment decisions that closely follow clinical guidelines, especially in early-stage cancer,” conclude the authors.
“However, they are not yet reliable for complex cases where doctors must consider many individual factors beyond what guidelines capture. LLM-generated advice should be used cautiously, only as a supplemental tool, and always alongside professional medical judgment.”
