While artificial intelligence-powered systems can improve clinicians’ diagnostic accuracy during telehealth visits, doctors still don’t fully trust algorithms to screen patients. That’s according to a recent study by Johns Hopkins researchers, which they say highlights the need to improve human-AI collaboration.
“Physicians do better with AI assistance, yet they still hesitate to alter their practices significantly as a result. For example, they will still frequently ask telehealth patients to visit the clinic for definitive testing,” says Mathias Unberath, John C. Malone Associate Professor of Computer Science and a member of the team whose study appeared in Nature Communications Medicine.
Unberath and Therese Canares, a Johns Hopkins emergency medicine physician, used a smartphone-based AI system they developed (CurieDx) to examine “explainable” AI’s potential to enhance clinicians’ trust in AI-driven diagnostic tools. They specifically focused on how doctors use and perceive the system’s explanations of strep throat diagnoses, which rely on analyzing smartphone images of users’ throats.
The researchers created mockups featuring techniques the AI system might use to explain its diagnoses, including highlighting key visual features and providing examples of images it already analyzed and accurately diagnosed as either strep or not.
The team had clinicians review the mockups, measuring their agreement with CurieDx’s resulting diagnostic recommendations, perceived trust in the system, and how the explanations influenced physicians’ recommendations. “We found that explaining by example was the most promising method,” says lead author Catalina Gomez, a PhD student in computer science.
The researchers theorized that this type of explanation was the most successful because it most closely mirrors human clinical reasoning, which involves incorporating prior experience in the analysis of a patient’s condition.
“This kind of explanation improved the accuracy of the clinicians’ decisions. The providers also trusted the AI-generated predictions just as much as they trusted the diagnoses produced by their customary clinical prediction rule, which acted as our baseline,” says Gomez.
Even so, the clinicians in the study often felt it was necessary to ask patients diagnosed remotely to visit the clinic for a follow-up. “This opens the dialogue to examine clinical workflows that incorporate AI screening tools,” says Canares.
The team plans to continue exploring explainable AI methods to help users better understand how AI-powered systems work and further enhance user trust in those systems.