Escaping Pilot Purgatory: Healthcare’s GenAI Future Will Be Decided by Deployment, Not Hype
By: N. Adam Brown, MD MBA
Healthcare has moved past the stage where generative AI can win attention on novelty alone. The market is still crowded with demos, pilots, and inflated promises, but that is no longer the interesting part. The more serious conversation now revolves around deployment.
McKinsey reported that 85% of healthcare leaders were interested in generative AI tools or had implemented them in 2025, but almost half (47%) were still waiting to see how other organizations fared.
That gap between experimentation and scaled deployment is where many organizations are now stuck, and it’s a place I call “pilot purgatory”.
Why so many pilots never reach scale
The reasons are not especially mysterious. Generative AI is easy to test in a contained setting, but much harder to operationalize across governance, compliance, clinical leadership, patients, and daily workflows.
A lot of the confusion starts with the fact that healthcare is dealing with two very different kinds of AI.
Traditional predictive models used in risk scoring, deterioration monitoring, and utilization forecasting are more bounded and easier to validate. They behave in more consistent ways, which carries weight in an industry built around accountability.
Probabilistic large language models (LLMs) are different. They are phenomenal at summarization, drafting, and making sense of unstructured information, but they work by predicting what is most likely to come next, not by verifying that whatever comes next is correct.
Why clinical use moves more slowly
The FDA has explicitly warned that so-called AI “hallucinations” may present a significant challenge in healthcare applications where highly accurate, truthful information is critical.
A plausible but wrong answer in a search engine is irritating. A plausible but wrong answer in a clinical context is something else entirely. Patient safety, compliance exposure, and immediate distrust among the clinicians required to use the tool are all very real and serious concerns.
This is where probabilistic LLMs run into a basic healthcare reality: if the output cannot be trusted consistently, it will not be allowed near higher-risk clinical decisions
So health systems are doing what they usually do when uncertainty meets risk: they are segmenting.
More established and predictable models are still the better fit for clinical decision support and other high-risk uses tied to diagnosis, treatment, or triage.
Generative models, on the other hand, are getting adopted first in places where the risk is lower and the payoff is more obvious and immediate.
Where GenAI is gaining traction
The fastest returns are coming from administrative work that has been painfully inefficient for years.
This includes:
Documentation support and chart summarization
Revenue cycle and administrative workflows, including prior authorizations, claim support, and utilization review prep
Patient-facing communication, where clinical language can be translated into something more understandable
McKinsey’s above-mentioned research shows growing interest in operational use cases, while related work on healthcare service operations continues to identify automation and workflow support as areas with meaningful upside.
That should not surprise anyone, because again, administrative simplification is where the value is easiest to prove.
That does not mean diagnostic AI is dead. It simply means it is moving at a pace dictated by the need for stronger validation, clearer accountability, and better governance.
What separates progress from theater
Many organizations made the same assumption that generative AI could be bought, integrated, and rolled out like any other piece of healthcare IT.
But it doesn’t work that way. The technology behaves more like a junior staff member, highly capable, occasionally wrong, and in constant need of supervision.
The health systems that escape pilot purgatory are the ones willing to exercise restraint and be more disciplined about where AI tools belong.
That restraint starts with a few questions:
Where does this solve a real operational problem?
Where can the output be reviewed before it affects patient care or day-to-day operations?
Where can we show a measurable return in time, cost, or workflow efficiency?
This mindset is less flashy than the conference version of AI strategy, but it’s much closer to what healthcare adoption requires: careful consideration of governance, risk, and compliance constraints.
The industry does not need more GenAI theatrics. It needs deployment in places where the upside is real and the workflow can support the technology.
That is how pilot purgatory ends.
References
https://www.fda.gov/media/182871/download
https://www.deloitte.com/az/en/issues/generative-ai/state-of-generative-ai-in-enterprise.html