SIAM News Blog
SIAM News
Print

Is There an Artificial Intelligence in the House?

By Matthew R. Francis

Medical care routinely involves life-or-death decisions, the allocation of expensive or rare resources, and ongoing management of real people’s health. Mistakes can be costly or even deadly, and healthcare professionals—as human beings themselves—are prone to the same biases and bigotries as the general population.

For this reason, medical centers in many countries are beginning to incorporate artificial intelligence (AI) into their practices. After all, computers in the abstract are not subject to the same foibles as humanity. In practice, however, medical AI perpetuates many of the same biases that are present in the system, particularly in terms of disparities in diagnosis and treatment (see Figure 1).

“Everyone knows that biased data can lead to biased output,” Ravi Parikh, an oncologist at the University of Pennsylvania, said. “The issue in healthcare is that the decision points are such high stakes. When you talk about AI, you’re talking about how to deploy resources that could reduce morbidity, keep patients out of the hospital, and save someone’s life. That’s why bias in healthcare AI is arguably one of the most important and consequential aspects of AI.”

While AI as a subfield of computer science has been around for over 60 years, it has only made progress in usable algorithms over the last decade. Such usability comes in the form of machine learning, which extrapolates patterns in training data to solve various problems. This training data is one spot where problems can arise.

“Healthcare has been late to the party with AI and machine learning,” Parikh said. “We’ve had to learn from a lot of lessons that have come up with other industries. [It] isn’t that AI is inherently biased; even traditional predictive tools and other things can be biased as well. The problem is that the data used in healthcare is subject to biases that certain clinicians, health systems, or payers might perpetuate, including coverage decisions that may disproportionately affect certain minority groups.”

Figure 1. As healthcare providers begin using artificial intelligence (AI) to help guide medical decisions, it is important that computerized medicine does not perpetuate systemic inequalities in diagnosis or treatment methods. Public domain image.
Parikh spoke about improving medical AI at the 2021 American Association for the Advancement of Science (AAAS) Annual Meeting, which took place virtually in February. He identified three specific forms of statistical bias (in addition to normal measurement error): undersampling, labeling problems, and heterogeneity of effects. Undersampling can occur because white people tend to have better access to healthcare, while systemic problems lead to the inclusion of a disproportionately small percentage of people of color in the data. As a result, the widely used Framingham Risk Score—which estimates the 10-year cardiovascular risk of an individual—is nearly 20 percent lower for Black patients than white patients with the same set of clinical characteristics.

Access to care is also a source of the labeling problem. For instance, an algorithm might conclude that a Black patient no longer needs care when they actually discontinue treatment because various reasons—such as a lack of transportation, job issues, or family obligations—prevent them from getting to the medical center. Finally, heterogeneity of effects includes high rates of false negatives for Black patients because training data can miss factors that occur more frequently in minority groups.

Although these biases are statistical, they have obvious connections to larger systemic problems that include transportation disparities, insurance issues (particularly for those without job-related benefits), a lack of hospitals in rural areas and minority neighborhoods, and so forth. AI did not create these problems, but it certainly should not perpetuate them either. Such avoidance requires active intervention.

Trust and Explainability

“When we’re talking about machines and algorithms, one goal should be understanding biases,” Elham Tabassi, a researcher at the U.S. National Institute of Standards and Technology (NIST), said. “We need to understand what biases [are present]; have tools to quantify them; and [have] practices, standards, and documents about how to manage and mitigate them.”

Because it defines standards for both government and industry, NIST is particularly concerned with the complications of AI applications. During her AAAS presentation, Tabassi addressed the need for trustworthy AI — with the unspoken implication that many current machine learning implementations are not trustworthy enough.

“Trust and risk are two sides of the same coin,” she said. “What things should we be worried about? Discrimination is one, bias is one, accuracy is one. Can we build zero-bias algorithms? Maybe one day. The same thing [is true] for machines as for humans: we are not going to reach zero bias. The expectation is understanding, identifying, and managing bias.”

These issues are present in many current applications—from self-driving cars to facial recognition—thus adding a sense of urgency to the need for standards and accountability. “The issue with healthcare AI is that it uniquely has the potential to mask bias and make it seem like you’re generating an accurate prediction,” Parikh said. “A lot of the output from AI and machine learning is a black box. We don’t understand the variables that go into a prediction, [which] exists as a complex association of nonlinear relationships.”

This concept—called explainability in computer science terms—is separate from bias, though it can have similar effects in terms of trust and risk assessment. To complicate matters, explainability varies widely from application to application. “What developers expect from an explainable AI is very different than the financial sector, based on the legal requirements needed from explainability of the algorithms,” Tabassi said. “We’re bringing enough knowledge and understanding about the risks…to develop a risk management framework as a tool for everybody to make the right decisions.”

Do No Harm

Healthcare naturally has separate ethical standards from other fields that utilize AI, including the classic “do no harm” mantra. This sentiment may limit deployment until researchers can solve some of the problems that are related to bias and trust. Nevertheless, both Parikh and Tabassi are hopeful. “It’s going to be rare even five or 10 years from now that an AI device replaces a healthcare worker, because the decisions are so high stakes,” Parikh said. “AI hasn’t gotten to a performance level yet where we’re talking about replacing humans, and ultimately I don’t think it’s safe to replace clinicians. It can be reassuring that there is a human check on a potentially biased algorithm.”

The reverse—a machine check on potentially biased humans—is also true in ideal circumstances. Parikh believes that assistance could come from a simple tool with widespread use in clinical settings: checklists, which help surgeons and other workers keep track of every step during a complicated procedure. “At various agencies, there are preliminary checklists around appropriate reporting for potential bias in an algorithm for publication,” he said. “But we need checklists when it comes to potentially vetting an algorithm for clinical implementation as well. That type of checklist could really help clinicians filter through a lot of the noise and difficult-to-understand concepts of what it takes to declare an algorithm biased.”

One such check is software that requires doctors to perform end-of-life conversations with patients, which they often neglect to do for people of color. Simple, automated prompts of this sort that remind practitioners to avoid bias might make major differences in quality of care.

AI cannot solve systemic problems in the healthcare sector on its own. However, researchers and clinicians are becoming more aware of how and where bias creeps in, rather than assuming that computers are free of such issues. Establishing standards will help achieve trust and identify the origins of biases. Even though computers cannot care the way that humans do, they can help us fix our own shortcomings in the medical field.

Matthew R. Francis is a physicist, science writer, public speaker, educator, and frequent wearer of jaunty hats. His website is BowlerHatScience.org.

blog comments powered by Disqus