March 11, 2026

In the first article of this series, we looked at how AI is already reading scans, flagging sepsis, and even suggesting treatment options at the bedside. If diagnostic AI is reshaping how doctors practice, the next wave is reshaping how they learn. Large language models, generative tools, and virtual patient simulators are moving into medical education, promising more personalized, scalable, and safe training. But they also raise uncomfortable questions: How much should future doctors rely on AI? How do we prevent erosion of clinical judgment? And what does competence mean when your most available “teaching partner” is a model, not a mentor?
AI is now being incorporated into every stage of training to create personalized, scalable and risk-free learning experiences:

AI tools are transforming the creation of medical education materials and reshaping faculty workflows in ways that were almost unthinkable five years ago:

Stanford Medicine’s AI Initiative integrates tools like SecureGPT, OpenEvidence, and NotebookLM to teach AI literacy alongside clinical skills. Harvard Medical School’s AI in Clinical Medicine course covers diagnosis, outcome prediction, and treatment planning. The aim isn’t just “AI skills,” but a new kind of clinical judgment: knowing how and when to lean on the model, when to question it, and how to explain that reasoning to patients and colleagues.
Training now includes prompt engineering, crafting focused, testable queries, and critical appraisal of AI‑generated information. This matters: studies show diagnostic performance improves substantially with well-structured prompts, particularly in complex cases [7].

AI literacy cannot end at graduation. These models are rapidly evolving, with each year producing ever more sophisticated versions.
Practicing clinicians increasingly encounter LLM-driven decision support systems in their hospitals, and they need training in:
These skills allow clinicians to maximize their use of AI while preserving human judgement. As pediatrician and medical educator Dr. Lakshimi Krishnan observes, “When I see patients, I’m not simply processing data points but interpreting a narrative shaped by cultural context, experience, and human interaction.” AI excels at pattern-matching but struggles with that nuance, making clinician oversight essential.
One anecdote brings this home. While watching a recent episode of the medical drama “The Pitt”, an AI scribe entered incorrect details into a patient’s chart, and my niece, an ER nurse who was watching it with me, immediately said, “That’s totally on the doctor. She should have reviewed it before submitting.”
Her response captures a critical truth: AI can streamline workflow, but accountability stays with the clinician. Studies confirm this, clinicians trained to scrutinize AI for errors (e.g., hallucinations or contextual mismatches) achieve higher diagnostic accuracy and reduce overreliance, much like prompt engineering sharpens inputs [8]. Without that review habit, even advanced tools falter.
But shifting the culture to accept AI, (particularly in the diagnostic arena), presents real challenges. Clinical hierarchies are built on seniority and experience; an LLM has neither. When AI suggests a diagnosis that conflicts with a physician’s intuition, the default response is often to dismiss it. CME needs to address that directly, reframing AI from “junior assistant” to a capable reasoning partner whose suggestions should be interrogated and considered, not necessarily obeyed, but not completely disregarded either. Learning how to interact with and query this partner effectively will become a vital skill that has the potential to substantially enhance patient outcomes, particularly in the areas of complex and rare diseases, where primary care physicians typically have little training [9].

On the plus side, AI is available 24/7, offers unlimited practice without burdening real patients, provides consistent evaluation criteria, and can personalize learning in ways traditional curricula struggle to match. But LLMs currently have issues that educational systems must proactively address:
Ultimately, if students delegate analytical tasks to GenAI and aren’t taught to critically assess the output, it can erode the independent judgment, and error-based expertise essential for clinical practice. Instruction on the use of AI-based systems and tools must be accompanied by pedagogical scaffolding, digital literacy, and faculty oversight to prevent passivity, misinformation uptake, and ethical lapses like plagiarism [14].

The goal is not AI‑dependent physicians, but AI‑fluent ones: clinicians who can use these tools to sharpen their reasoning while preserving the human connection that makes medicine work. Just as we argued in our AI in Diagnostics article that AI must be treated as a supervised colleague, not an oracle, medical education must teach future doctors to supervise AI as carefully as they supervise students. That means pairing AI integration with robust hands‑on training, a continued emphasis on humanistic care, and clear ethical frameworks for when, and how, these powerful systems should be used.
Disclaimer: The mention of specific companies, products, or organizations in this article is for informational purposes only and does not imply endorsement. The companies whose products were referenced were not consulted, involved in the preparation of this content, nor did they provide any funding or compensation.
References
[1] N. Golchini, E. Passalacqua, L. Vaughn, R.-E. E. Abdulnour, T. Zack, and S. Finlayson, “Socratic AI: An Adaptive Tutor for Clinical Case Based Learning*,” medRxiv, p. 2025.06.22.25329661, Jan. 2025, doi: 10.1101/2025.06.22.25329661.
[2] D. Jang et al., “MedTutor: A Retrieval-Augmented LLM System for Case-Based Medical Education,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, I. Habernal, P. Schulam, and J. Tiedemann, Eds., Suzhou, China: Association for Computational Linguistics, Nov. 2025, pp. 319–353. doi: 10.18653/v1/2025.emnlp-demos.24.
[3] J. Kang and J. Ahn, “Technologies, opportunities, challenges, and future directions for integrating generative artificial intelligence into medical education: a narrative review.,” Ewha Med. J., vol. 48, no. 4, p. e53, Oct. 2025, doi: 10.12771/emj.2025.00787.
[4] I. Zafar et al., “Ten tips for utilizing AI to generate high quality OSCE stations in medical education.,” Front. Med., vol. 12, p. 1744657, 2025, doi: 10.3389/fmed.2025.1744657.
[5] D. Sikri and V. Nuguri, “From notes to networks – A three-dimensional framework with integrated curation for using Google NotebookLM in health professions education.,” Med. Teach., pp. 1–5, Jul. 2025, doi: 10.1080/0142159X.2025.2533408.
[6] Z. Zouakia, E. Logak, A. Szymczak, J.-P. Jais, A. Burgun, and R. Tsopra, “AI-Driven Objective Structured Clinical Examination Generation in Digital Health Education: Comparative Analysis of Three GPT-4o Configurations,” JMIR Med Educ, vol. 12, p. e82116, 2026, doi: 10.2196/82116.
[7] F. E. A. Hassanein, Y. Ahmed, S. Maher, A. El Barbary, and A. Abou-Bakr, “Prompt-dependent performance of multimodal AI model in oral diagnosis: a comprehensive analysis of accuracy, narrative quality, calibration, and latency versus human experts,” Sci. Rep., vol. 15, no. 1, p. 37932, 2025, doi: 10.1038/s41598-025-22979-z.
[8] S. S. Everett et al., “From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis.,” medRxiv Prepr. Serv. Heal. Sci., Jun. 2025, doi: 10.1101/2025.06.07.25329176.
[9] T. Abdullahi, R. Singh, and C. Eickhoff, “Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.,” JMIR Med. Educ., vol. 10, p. e51391, Feb. 2024, doi: 10.2196/51391.
[10] I. A. Qazi, A. Ali, A. U. Khawaja, M. J. Akhtar, A. Z. Sheikh, and M. H. Alizai, “Automation Bias in Large Language Model Assisted Diagnostic Reasoning Among AI-Trained Physicians,” medRxiv, p. 2025.08.23.25334280, Jan. 2025, doi: 10.1101/2025.08.23.25334280.
[11] F. Kücking et al., “Impact of AI recommendation correctness on diagnostic accuracy in clinical decision-making,” Int. J. Med. Inform., vol. 207, p. 106223, 2026, doi: 10.1016/j.ijmedinf.2025.106223.
[12] L. Dahmani and V. D. Bohbot, “Habitual use of GPS negatively impacts spatial memory during self-guided navigation.,” Sci. Rep., vol. 10, no. 1, p. 6310, Apr. 2020, doi: 10.1038/s41598-020-62877-0.
[13] L. Hejtmánek, I. Oravcová, J. Motýl, J. Horáček, and I. Fajnerová, “Spatial knowledge impairment after GPS guided navigation: Eye-tracking study in a virtual town,” Int. J. Hum. Comput. Stud., vol. 116, pp. 15–24, 2018, doi: 10.1016/j.ijhcs.2018.04.006.
[14] J. S. Izquierdo-Condoy, M. Arias-Intriago, A. Tello-De-la-Torre, F. Busch, and E. Ortiz-Prado, “Generative Artificial Intelligence in Medical Education: Enhancing Critical Thinking or Undermining Cognitive Autonomy?,” J. Med. Internet Res., vol. 27, p. e76340, Nov. 2025, doi: 10.2196/76340.