Welcome to the fourth of an ongoing series of roundtable discussions among Chartis consulting leaders around the emerging reality of artificial intelligence (AI) in healthcare.
As generative AI capabilities and implementation use cases rapidly advance, ethical concerns have likewise multiplied. March of 2023 saw a 5-fold increase over the previous year in media mentions of ethical and responsible AI, while recent headlines have featured health-related AI applications producing wildly harmful outputs—from an eating disorder wellness chatbot that encouraged users to limit their calorie intake to algorithms that led to cutting off payment for care of acutely ill individuals who later died.1 How can organizations navigate these complex ethical concerns as AI becomes a more prominent part of healthcare’s future?
Join Tom Kiesau, Chartis Chief Innovation Officer and Head of Chartis Digital; Jody Cervenak, Chartis Informatics and Technology Practice Leader; Julie Massey, MD, Chartis Clinical IT Practice Leader; and Jon Freedman, Partner in Chartis Digital, as they discuss AI, what Chartis is seeing in real time, and what they think is coming next.
Tom Kiesau: Welcome back, everyone. Let’s return to a problem we briefly referenced in our last roundtable and unpack it a bit more: What does it mean to have a black box in healthcare AI, and why is that an issue?
In a nutshell, AI mechanisms, algorithms, and the data behind the scenes make recommendations, but we have very little insight into how and why those recommendations are made. This has always been true to some extent, but now, with large language models (LLMs), true transparency is all the more difficult.
A good example is the recent launch of a wellness chatbot around eating disorders that was giving blatantly inaccurate advice, scaring people into unhealthy and even dangerous behaviors. It was unclear why the chatbot itself was going down the path it was—or why the creators were unable to solve for that in advance.
JULIE MASSEY, MD:
In that example, understanding and vetting the inputs would have been essential. The chatbot’s advice wasn’t tailored to the particular population or driven by clinical perspectives.
In addition to the inputs, it’s also important to understand when an AI algorithm is being applied to decision-making—and how. For instance, AI is sometimes used for prior authorization, but it’s often unclear when the AI recommendation should be used as guidance to help inform clinical judgment and when it should be followed as policy.
Transparency of what the inputs are and how the algorithms are arriving at their recommendations is critical for the end users to have the information they need to use those outputs wisely.
Tom: Some AI models have so many inputs that it is not humanly possible to understand all the correlations and causations. When it’s not a simple “this correlates with that,” how should healthcare leaders think through the various components and decide what is acceptable?
Some of these LLMs and generative AI models are giving answers in time frames and at a level of accuracy and deep insight that was previously unknown—but there are some trade-offs in our current understanding. The fundamental question is: How much should you trust the black box? How much do you have to take it apart and understand it to take advantage of the real value provided? The more you have to take it apart because of a lack of trust, the less you’re going to benefit from the advantages these evolving approaches offer. That’s what people have to wrestle with.
For true data quality, you need to be able to track back to the original source and understand it. Leaders should consider defining which sources can be used, what tasks they’re expecting the AI to do, and when AI recommendations should be leveraged for decision-making.
We know from studies of electronic health record (EHR) use that even the order of listed suggestions can influence how often clinicians select specific answers. Clearly communicating the inputs and appropriate end uses for AI will be important for ensuring constructive use of these tools.
Additionally, certain applications may be more straightforward and clinically appropriate. For instance, AI could check against drug interactions. The AI output would flag potential harm, which the clinician could fold into the patient’s treatment plan. This application, as a failsafe enabling better quality, is a vastly different proposition from using AI recommendations to create the patient’s treatment plan.
Tom: That brings us to specific use cases. Given these black box concerns, which ones should executives consider pursuing in the near term? And how can leaders engender trust with their use?
Clinical use cases require the highest amount of rigor in development, as well as line of sight into how and why the recommendations are made.
This goes back to the concept of automation and augmentation. If you’re going to support clinical judgement with AI, you need to create insight into at least the key factors that are contributing to that recommendation. Otherwise, you’re not augmenting; you’re superseding clinical judgment in decision-making.
If you do have the necessary insight, then start with tried and true lower-risk, more reliable scenarios for piloting AI. See how it fits in. Then build on your learnings. By taking baby steps (like using AI to adapt patient educational materials to the right reading level), you can double-check how the AI works in practice, establish trust with the outputs, and build from there.
Transparency is equally as important for building trust on the patient side. For instance, a lot of organizations are considering or already using chatbots. Many of these chatbots already sound human-like—and will sound more and more human as the technology evolves. Many consumers are OK with speaking to chatbots—especially if the answers are correct—but it’s important simply to disclose at the front end that that is the case, and provide an outlet for people to engage with a human if they prefer.
Tom: Establishing rigorous guardrails and governance specific to AI will be essential to protect against negative outcomes and build trust among both patients and care teams. What should that look like, and when should you communicate to patients that AI is being used?
Leveraging AI appropriately could become a strategic differentiator for health systems that get it right. As we’ve seen with some prominent examples in the media recently, it can become a real disadvantage for organizations that get it wrong.
And there are a lot of ways to get it wrong. Health systems that don’t have clearly defined guardrails in place will have these problems. Regulators by and large aren’t there yet, so health systems need to be proactive.
One of those consistent guidelines needs to be about when to pull in the human touch. You also need a commitment to identify when human interaction and decision-making will override the algorithms. And where there are opportunities to leverage automated interactions—such as behavioral health support, coaching, or counseling—patients need to be able to make that conscious choice as a consumer.
AI governance needs to consist of both a set of guidelines and a risk management function within the health system. You need to have your legal team involved as well as ethical experts who will help you unpack these thorny issues. And you need to have an ongoing process for evaluating compliance, performance, and impacts.
Additionally, guidance needs to come from the top around when and how to best leverage new use cases, while meeting the health system’s established standards of fidelity.
To Julie’s point, considering the human touch is incredibly important. Using an earlier example, customer service chatbots can allow a great benefit and convenience because of their immediate availability 24/7. But if meaningful human interactions are a differentiator for your organization, determine how you won’t lose that human touch as you leverage AI to help make your human involvement more efficient and impactful.
And clearly establish when you will communicate with patients about the use of AI. That will likely depend on the level of human involvement. For instance, if you are using AI simply as an efficiency aid, but everything has in-depth human review and oversight, disclosure may not be necessary. One example might be an autogenerated communication that is reviewed and edited by a human before the organization sends it to the patient. But anything that proxies a human response needs to be disclosed. Patients need to know, and they’ll trust you more as a result.