February 10, 2020
Smartphone applications (apps) using so-called artificial intelligence (AI) aimed at the general public for use on suspicious skin lesions are unreliable, say UK researchers reporting a systematic review.
These apps are providing information that could lead to "potentially life-or-death decisions," commented co-lead author Hywel C. Williams, MD, PhD, from the Centre of Evidence Based Dermatology, University of Nottingham, England.
"The one thing you mustn't do in a situation where early diagnosis can make a difference between life and death is you mustn't miss the melanoma," he said in an interview with Medscape Medical News.
"These apps were missing melanomas and that's very worrisome," he commented.
The review included nine studies of skin cancer smartphone apps, including two apps, SkinScan and SkinVision, that have been given Conformit Europenne (CE) marks, allowing them to be marketed across Europe. These apps are also available in Australia and New Zealand, but not in the United States.
The review found that SkinScan was not able to identify any melanomas in the one study that assessed this app, while SkinVision had a relatively low sensitivity and specificity, with 12% of cancerous or precancerous lesions missed and 21% of benign lesions wrongly identified as cancerous.
This means that among 1000 people with a melanoma prevalence of 3%, 4 of 30 melanomas would be missed, and 200 people would be incorrectly told that a mole was of high concern, the authors estimate.
The research was published by The BMJ on February 10.
"Although I was broad minded on the potential benefit of apps for diagnosing skin cancer, I am now worried given the results of our study and the overall poor quality of studies used to test these apps," Williams commented in a statement.
Co-author Jac Dinnes, PhD, from the Institute of Applied Health Research at the University of Birmingham, England, added it is "really disappointing that there is not better quality evidence available to judge the efficacy of these apps."
"It is vital that healthcare professionals are aware of the current limitations both in the technologies and in their evaluations," she added.
The results also highlight the limitations of the regulatory system governing smartphone apps, as they are currently not subject to assessment by bodies such as the UK's Medicines and Healthcare Products Regulatory Agency (MHRA), the authors comment.
"Regulators need to become alert to the potential harm that poorly performing algorithm-based diagnostic or risk monitoring apps create," said co-lead author Jonathan J. Deeks, PhD, also at the Institute of Applied Health Research.
"We rely on the CE mark as a sign of quality, but the current CE mark assessment processes are not fit for protecting the public against the risks that these apps present."
Speaking with Medscape Medical News, Williams lamented the poor quality of the research that had been conducted. "These studies were not good enough," he said, adding that "there's no excuse for really poor study design and poor reporting."
He would like to see the regulations tightened around AI apps purporting to inform decision-making for the general public, suggesting that these devices should be assessed by the MHRA. "I really do think a CE mark is not enough," he said.
The team notes that the skin cancer apps "all include disclaimers that the results should only be used as a guide and cannot replace healthcare advice," through which the manufacturers "attempt to evade any responsibility for negative outcomes experienced by users."
Nevertheless, the "poor and variable performance" of the apps revealed by their review indicates that they "have not yet shown sufficient promise to recommend their use," they conclude.
The "official approval" implied by a CE mark "will give consumers the impression that the apps have been assessed as effective and safe," writes Ben Goldacre, DataLab director, Nuffield Department of Primary Care, University of Oxford, England, and colleagues in an accompanying editorial.
"The implicit assumption is that apps are similarly low risk technology" to devices such as sticking plasters and reading glasses, they comment.
"But shortcomings in diagnostic apps can have serious implications," they warn. The "risks include psychological harm from health anxiety or 'cyberchondria,' and physical harm from misdiagnosis or overdiagnosis; for clinicians there is a risk of increased workload, and changes to ethical or legal responsibilities around triage, referral, diagnosis, and treatment." There is also potential for "inappropriate resource use, and even loss of credibility for digital technology in general."
Details of the Review
For their review, the authors searched the Cochrane Central Register on Controlled Trials, the MEDLNE, Embase, Cumulative Index to Nursing and Allied Health Literature, Conference Proceedings Citation index, Zetoc, and Science Citation Index databases, and online trial registers for studies published between August 2016 and April 2019.
From 80 studies identified, nine met the eligibility criteria.
Of those, six studies, evaluating a total of 725 skin lesions, determined the accuracy of smartphone apps in risk stratifying suspicious skin lesions by comparing them against a histopathological reference standard diagnosis or expert follow-up.
Five of these studies aimed to detect only melanoma, while one sought to differentiate between malignant or premalignant lesions (including melanoma, basal cell carcinoma, and squamous cell carcinoma) and benign lesions.
The three remaining studies, which evaluated 407 lesions in all, compared smartphone app recommendations against a reference standard of expert recommendations for further investigation or intervention.
The researchers found the studies had a string of potential biases and limitations.
For example, only four studies recruited a consecutive sample of study participants and lesions, and only two included lesions selected by study participants, whereas five studies used lesions that had been selected by a clinician.
Three studies reported that it took five to 10 attempts to obtain an adequate image. In seven studies, it was the researchers and not the patients who used the app to photograph the lesions, and two studies used images obtained from dermatology databases.
This "raised concerns that the results of the studies were unlikely to be representative of real life use," the authors comment.
In addition, the exclusion of unevaluable images "might have systematically inflated the diagnostic performance of the tested apps," they add.
The independent research was supported by the National Institute for Health Research (NIHR) Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham and is an update of one of a collection of reviews funded by the NIHR through its Cochrane Systematic Review Programme Grant.