AI In Healthcare (1): The Machine Will See You Now — AI in Diagnostics

AI has moved far beyond the hype in healthcare and into diagnostics: AI systems are now reading scans, drafting notes, assessing risk, and even proposing treatment options. But as the pace accelerates, so do clinicians’ concerns about skill erosion, liability, and whether AI can truly deliver on what it promises.

This article opens a four-part “AI in Healthcare” series exploring how these changes affect patients, medical training, frontline clinicians, and hospital operations.We start where the impact is both most visible and most time‑critical: diagnostics, and the emerging role of AI in supporting treatment planning. AI can now flag strokes and sepsis in real time and sketch out possible regimens, yet it still misses life‑threatening diagnoses, hallucinates confident nonsense, and bakes in existing biases. As hospitals race to weave these tools into clinical workflows, the real question is not whether AI can see patterns humans can’t, but whether we can turn it into a safe diagnostic colleague instead of an impressive, unpredictable black box.

AI in healthcare diagnostics: Banner

As the first article in this series, we start where AI is already changing day-to-day medicine the fastest: diagnostics. In radiology suites, pathology labs, and emergency departments, AI systems are increasingly being used to flag urgent findings, prioritize high-risk cases, and synthesize fragments of clinical data into actionable signals. These are designed to augment (not replace) clinician expertise at the point where decisions are most time-sensitive. This is a booming market, valued at $1.77 billion USD in 2025 and expected to grow to over $9 Billion by 2031. Yet as diagnostic AI becomes more capable and more tightly embedded into hospital workflows, it also raises sharper questions about real-world accuracy across diverse patients, clinician trust and “automation bias,” and how to preserve the irreplaceable role of human judgment when the stakes are highest.

AI in healthcare the diagnostic revolution

The Diagnostic Revolution

AI’s greatest clinical foothold is in medical imaging. By late 2025, the U.S. FDA had cleared over 1,000 AI-enabled radiology devices, making imaging the single largest category of approved medical AI. Platforms like Aidoc scan CT images in real time, flagging life-threatening conditions such as strokes, pulmonary embolisms, and hemorrhages so radiologists can prioritize the most urgent cases [1]. Viz.ai goes further, identifying suspected strokes from imaging data and automatically coordinating care teams to speed treatment decisions. For point-of-care imaging, Caption Health helps clinicians with limited ultrasound experience acquire diagnostic-quality cardiac images. Butterfly iQ goes a step further, pairing its handheld ultrasound probe with AI guidance, so nonexpert clinicians, including those in rural or resource-limited clinics without onsite sonographers, can capture and interpret scans at the bedside [2].

In Canada, private/public partnerships are building an advanced AI-enabled wound care screening and assessment system. This initiative, led by Swift Medical and partners, integrates advanced AI technologies into a digital wound care platform to deliver objective wound assessment, real-time treatment support, and predictive healing analysis. By automating key clinical tasks (for which physicians typically have just 10 hours of training), the project aims to improve accuracy, equity, and access to high-quality wound care across diverse care settings in Canada, reducing the $4B annual burden of treating chronic wounds.

AI diagnostic tools excel at pattern recognition in narrow, well-defined tasks, bolstered by comprehensive and expert-vetted training data. In certain studies, AI has matched or exceeded expert performance in error detection, skin lesion classification, and ovarian tumor scoring, particularly when 3D imaging is available [3], [4]. A widely cited study found that GPT-4 alone outperformed physicians in diagnostic accuracy (92% vs. 74%), even when doctors used AI collaboratively (76%) [5]. Physicians scored poorly in part because many were not trained to interrogate AI output, used weak prompts, skimmed responses, and lacked a consistent method to verify and integrate recommendations into clinical reasoning. That gap highlights how essential structured physician training will be if AI is to improve diagnostic performance safely and reliably.

AI in healthcare diagnostics: The new wave of predictive and multimodal AI

Beyond Imaging: Predictive and Multimodal AI

Multimodal AI systems represent the next frontier, combining signals from imaging, lab values, genomic data, and clinical notes to build a more complete diagnostic picture. Google’s experimental AMIE system, trained on medical conversations, performed as well as or better than primary care physicians in history-taking, diagnostic accuracy, and even empathy. Patient-facing tools like Ada Health, DxGPT, and OpenEvidence offer symptom assessment, triage, and literature-backed treatment recommendations.

The Prenosis Sepsis ImmunoScore is a leading example of how multimodal diagnostic AI is moving from promising prototypes to tools embedded in real hospital workflows. Trained on over 100,000 blood samples and data from over 25,000 patients, this first-in-class FDA-authorized AI sepsis diagnostic tool analyzes 22 parameters (including vitals and biomarkers) to stratify patients into risk categories predicting sepsis, mortality, and ICU transfer [6]. In a disease where minutes matter, this tool helps care teams identify high risk patients sooner. It is already integrated into Carle Foundation’s EHR workflows and is now being adopted at other hospitals.

Multimodal AI is also beginning to support shared decision-making. Facing a cancer diagnosis and an uncertain future, Steve Brown created his own set of AI agents to aid in diagnosis and treatment planning. He doesn’t see AI as a replacement for his oncology team, but as a partner in developing individualized care plans given that it is better at keeping up with new medical discoveries, and teasing meaning and identifying connections from mountains of data.

AI in healthcare treatment planning

Emerging Rapidly: Treatment planning

AI is beginning to show value in treatment planning, especially in cancer care, but it is not ready to run on autopilot. In radiotherapy planning, LLM-augmented automated therapy planning can better incorporate institutional constraints and generate plans that meet complex clinical goals, yet still requires clinician oversight to match each patient’s anatomy, risks, and treatment priorities [7]. Several large language models were tested on their ability to provide treatment plans for advanced cancer scenarios, and although they produced some potentially relevant treatment ideas, they also missed important details and left notable gaps, reinforcing that expert review is essential for safe, real-world use [8].

Understandably, general-purpose LLMs are built to be good at language in general, not to reliably navigate the details of oncology evidence and guidelines. To address this deficit, an expert-guided LLM system (MEREDITH) was specifically trained on curated molecular datasets and oncology studies and produced more diverse treatment recommendations that had high validation from molecular tumor boards [9]. In addition to domain-specific training, MERIDITH implemented retrieval‑augmented generation (RAG) and chain-of-thought to enhance reliability and credibility of its output.

RAG improves results by having the system pull in relevant, case-specific sources (such as PubMed papers, guideline statements, trial databases, and drug approval status) at the time it answers, so its suggestions are anchored to actual up-to-date evidence instead of educated-sounding guesses. Chain-of-thought then improves the reasoning process by forcing the model to work through (and explicate) the same steps an expert would—linking the patient’s mutations to targetable pathways, checking what evidence supports each option in that tumor type, consider approvals and trials, and proposing and justifying options. This stepwise structure helps reduce missed steps and makes the output easier for a molecular tumor board to review.

AI in healthcare diagnostics: Limitations

Where AI Falls Short

Despite impressive benchmarks, many deep learning models remain “black boxes” leaving clinicians wondering why and how AI reached a conclusion, eroding trust and complicating liability when errors occur. AI can also “hallucinate”, confidently generating medically incorrect information: in cancer care, for instance, AI treatment recommendations align with expert tumor boards only 50–70% of the time [3].

AI also struggles with diagnosis in real-world settings. In a randomized emergency department trial of the symptom-checker chatbots Ada and Symptoma, both missed potentially life‑threatening diagnoses in roughly 1 in 7 patients, and Ada undertriaged 13% of cases, making its recommendations unsafe to trust on their own [10].

A Stanford-led evaluation of AI therapy chatbots found that these systems sometimes stigmatized mental health conditions and sometimes respond inappropriately or unsafely in high-risk scenarios such as suicidality or delusional thinking, underscoring that fluent language does not equal reliable clinical judgment [11]. Algorithmic bias compounds these issues: when training data underrepresents minority populations, tools perform inconsistently across demographic groups, potentially widening existing health disparities [12], [13].

Perhaps most fundamentally, AI cannot replicate what makes medicine human. Trust is built through connection, shared decision-making, compassion, and the ability to navigate pain and uncertainty. Every patient deserves the thoughtful judgment, empathy and heartfelt understanding that only a human physician can offer. This kind of attentive, compassionate care becomes especially vital for individuals living with rare diseases, complex medical histories, or other high‑risk conditions.

AI in healthcare diagnostics: the future - what is missing, what is needed

Looking Ahead: Many Pieces Still Missing

AI in diagnostics is at an inflection point. The possibilities are tremendous, but a number of essential elements still must fall into place to unlock its true impact:

Training LLMs on high-quality curated medical datasets that are carefully annotated and verified by domain experts.
Including RAG and chain-of-thought to leverage current evidence and expose the reasoning behind the answers provided. Together, these approaches let a system say not just whatit recommends, but why, and which evidence it is using.
Incorporating robust guardrails and built-in safety filters for medical use (e.g. suicide risk, pregnancy, pediatrics) that include task-specific constraints (e.g. never invent doses, always cross-check drug interactions)
Ensuring tight workflow integration and human-in-the-loop design, with integration into EHRs, tumor boards and order entry in ways that support existing workflows and interfaces that make it easy to review, edit and override the model, reinforcing that clinicians remain accountable.
Equipping clinicians with practical skills for working with AI, including how to question, verify, and integrate model output.
Rigorously validating algorithms across diverse patient populations to avoid widening existing health disparities.
Defining clear lines of accountability when AI contributes to errors or near misses.

Above all, human judgment must remain at the center of care. AI can support clinicians in powerful ways, but it is not—and must never become—a replacement for qualified clinicians or genuine human connection.

Disclaimer: The mention of specific companies, products, or organizations in this article is for informational purposes only and does not imply endorsement. The companies whose products were referenced were not consulted, involved in the preparation of this content, nor did they provide any funding or compensation.

NEXT >

References

[1] T. Weikert et al., “Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm.,” Eur. Radiol., vol. 30, no. 12, pp. 6545–6553, Dec. 2020, doi: 10.1007/s00330-020-06998-0.

[2] C. Baloescu et al., “Artificial Intelligence–Guided Lung Ultrasound by Nonexperts,” JAMA Cardiol., vol. 10, no. 3, pp. 245–253, Mar. 2025, doi: 10.1001/jamacardio.2024.4991.

[3] S. Rahman, M. T. Hosain, N. Fahad, M. K. Morol, and M. J. Hossen, “Agentic artificial intelligence is the future of cancer detection and diagnosis,” Array, vol. 29, p. 100676, 2026, doi: 10.1016/j.array.2025.100676.

[4] S. Mitchell et al., “Artificial Intelligence in Ultrasound Diagnoses of Ovarian Cancer: A Systematic Review and Meta-Analysis.,” Cancers (Basel)., vol. 16, no. 2, Jan. 2024, doi: 10.3390/cancers16020422.

[5] E. Goh et al., “Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial,” JAMA Netw. Open, vol. 7, no. 10, pp. e2440969–e2440969, Oct. 2024, doi: 10.1001/jamanetworkopen.2024.40969.

[6] B. Akhil et al., “FDA-Authorized AI/ML Tool for Sepsis Prediction: Development and Validation,” NEJM AI, vol. 1, no. 12, p. AIoa2400867, Nov. 2024, doi: 10.1056/AIoa2400867.

[7] L. Yu et al., “Multicenter study on the versatility and adoption of AI-driven automated radiotherapy planning across cancer types,” Nat. Commun., vol. 17, no. 1, p. 867, 2025, doi: 10.1038/s41467-025-67581-z.

[8] M. Benary et al., “Leveraging Large Language Models for Decision Support in Personalized Oncology,” JAMA Netw. Open, vol. 6, no. 11, pp. e2343689–e2343689, Nov. 2023, doi: 10.1001/jamanetworkopen.2023.43689.

[9] J. Lammert et al., “Expert-Guided Large Language Models for Clinical Decision Support in Precision Oncology,” JCO Precis. Oncol., no. 8, p. e2400478, Oct. 2024, doi: 10.1200/PO-24-00478.

[10] J. Knitza et al., “Comparison of Two Symptom Checkers (Ada and Symptoma) in the Emergency Department: Randomized, Crossover, Head-to-Head, Double-Blinded Study.,” J. Med. Internet Res., vol. 26, p. e56514, Aug. 2024, doi: 10.2196/56514.

[11] J. Moore et al., “Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers.,” in Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, in FAccT ’25. New York, NY, USA: Association for Computing Machinery, 2025, pp. 599–627. doi: 10.1145/3715275.3732039.

[12] R. M. Ratwani, K. Sutton, and J. E. Galarraga, “Addressing AI Algorithmic Bias in Health Care,” JAMA, vol. 332, no. 13, pp. 1051–1052, Oct. 2024, doi: 10.1001/jama.2024.13486.

[13] J. Joseph, “Algorithmic bias in public health AI: a silent threat to equity in low-resource settings.,” Front. public Heal., vol. 13, p. 1643180, 2025, doi: 10.3389/fpubh.2025.1643180.

News & Views