A new study by researchers at Dartmouth Health highlights the potential risks of artificial intelligence in medical imaging research, showing that algorithms can be taught to give correct answers but for illogical reasons.
The study, published in Nature’s Scientific Reports, used a cache of 5,000 X-rays of human knee joints, and also factored in surveys those patients completed about their dietary habits.
Artificial intelligence software was then asked to identify which of the patients, based on a scan of the X-rays, were most likely to drink beer or eat refried beans, even though there is no visual evidence of either activity in an X-ray of a knee.
“We want to assume it sees things that a human would see, or a human would see if we only had just better vision,” said the paper’s co-author, Brandon Hill, a machine-learning researcher at Dartmouth Hitchcock. “And that's the core problem here: is that when it makes these associations, we presume it must be from something in the physiology, in the medical image. And that's not necessarily the case.”
While the machine learning tool did in fact often accurately determine which of the knees — that is, the humans who were X-rayed — were more likely to drink beer or eat beans, it did so by also making assumptions about race, gender and the city in which the medical image was taken. The algorithm was even able to determine what model of X-ray scanning machine took the original images, which allowed it to make connections between the location of the scan and the likelihood of certain dietary habits.
Ultimately, it was those variables that the AI used to determine who drank beer and ate refried beans, and not anything in the image itself related to food or beverage consumption, a phenomenon researchers call “shortcutting.”
“Part of what we're showing is, it's a double edged sword. It can see things humans can't,” said Hill. “But it can also see patterns that humans can't, and that can make it easy to deceive you.”
The study’s authors said the paper highlights the caution medical researchers should use in deploying machine learning tools.
“If you have AI that's detecting whether or not you think a transaction on a credit card is fraudulent, who cares why it thinks that? Let's just stop the credit card from being able to have charges,” said Dr. Peter Schilling, an orthopedic surgeon and the paper’s senior author.
But in the treatment of patients, Schilling advises clinicians to move forward conservatively with these tools in order to “actually optimize the care they’re given.”