Artificial intelligence algorithms for predicting suicidal behavior are too inaccurate to screen for people at high risk, a systematic review and meta-analysis has revealed.
The findings, in PLOS Medicine, dash hopes that modern machine learning methods are sophisticated enough to identify out those at most risk of suicide and self-harm for personalized interventions.
Machine learning misclassified more than half of the people who went to the hospital for self-harm or died by suicide as being at low risk.
“Many clinical practice guidelines around the world strongly discourage the use of risk assessment for suicide and self-harm as the basis on which to allocate effective after-care interventions,” reported Matthew Spittal, PhD, from the University of Melbourne, and co-workers.
“Our study shows that machine learning algorithms do no better at predicting future suicidal behavior than the traditional risk assessment tools that these guidelines were based on. We see no evidence to warrant changing these guidelines.”
Over the past 50 years, numerous risk assessment scales have been developed to identify patients at high risk of suicide or self-harm and classify them as being at high or low risk.
Treatment pathways are often based upon such risk assessments, but these scales generally lack accuracy, which is why clinical practice guidelines strongly advise against using them for suicide and self-harm.
Nonetheless, there has been renewed interest in developing predictive algorithms with the arrival of modern machine learning methods and access to electronic health record and registry data.
To investigate the accuracy of these newer methods, Spittal and team undertook a systematic review and meta-analysis of 53 studies identified through online databases.
Studies were included if they examined suicide or hospital-treated self-harm outcomes and used a case-control, case-cohort or cohort study design.
Overall, the researchers reported that studies in the field were of low quality, with most at either high risk of bias or unclear risk.
While the algorithms had good accuracy when assessed using a global measure—with an area under the receiver operating characteristic curve ranging from 0.69 to 0.93—they had poor accuracy when assessed against more clinically relevant individual measures.
The algorithms had modest sensitivity, of between 45% and 82%, and high specificity of between 91% and 95%. Positive likelihood ratios were ranged from 6.5 to 9.9 and negative likelihood values from 0.2 to 0.6.
This meant that although they were good at identifying people who will not present again for self-harm or die by suicide, they were generally poor at identifying those who would present again.
The modest sensitivity observed in the cohort studies indicated that more than half of those who repeat self-harm or die by suicide are misclassified as being at low risk.
Overall, the researchers deemed the predictive properties of the machine learning algorithms poor and no better than traditional risk assessment scales.
They advocate the management of patients treated in hospital for self-harm to include the following three components: a needs-based assessment and response; the identification of modifiable risk factors with treatment intended to reduce those exposures; and implementation of demonstrated effective aftercare interventions.
“Instead of predicting suicide and self-harm, there may be other ways in which artificial intelligence could be used to contribute to better outcomes for suicidal patients,” the team suggested.
“Future research could consider how machine learning methods could be used to augment existing collaborative psychosocial assessments.
“Specifically, can machine learning methods be used to identify modifiable risk factors for suicide and self-harm for individual patients? This may be a more tractable problem as the prevalence of many risk factors is likely to be higher than the prevalence of suicide or self-harm.”