AI in Healthcare (2): The Doctor Will Prompt You Now — How AI Is Rewriting Med Ed

How AI is changing the face of Medical Education

In the first article of this series, we looked at how AI is already reading scans, flagging sepsis, and even suggesting treatment options at the bedside. If diagnostic AI is reshaping how doctors practice, the next wave is reshaping how they learn. Large language models, generative tools, and virtual patient simulators are moving into medical education, promising more personalized, scalable, and safe training. But they also raise uncomfortable questions: How much should future doctors rely on AI? How do we prevent erosion of clinical judgment? And what does competence mean when your most available “teaching partner” is a model, not a mentor?

A New Generation of Teaching Tools

AI is now being incorporated into every stage of training to create personalized, scalable and risk-free learning experiences:

Adaptive LLM‑based tutors generate case-based questions, flashcards, and differential diagnoses, tailored to each student’s knowledge gaps and pace [1], [2].
Generative AI creates realistic patient histories and pathology cases, synthesizing dense literature into digestible teaching modules [3].
Most dramatically, AI-powered virtual patients (University of Arizona’s AIPatient (rated highly realistic by medical students), UBC’s Pharmacy Virtual Patient Tool, and Cornell’s MedSimAI) are multi-step conversational simulations that adapt in real time to student responses, with branching pathways that mimic real clinical encounters.

AI As a Force Multiplier for Educators

AI as a Force Multiplier for Medical Educators

AI tools are transforming the creation of medical education materials and reshaping faculty workflows in ways that were almost unthinkable five years ago:

Curriculum design: AI can synthesize evidence from literature databases, align learning modules to accreditation standards, and adapt content to learner needs, shrinking weeks of manual review into a few hours. AI is even being used to design Objective Structured Clinical Examination (OSCE) stations with realistic clinical scenarios [4].
Presentations: tools like NotebookLM auto-generate slides, speaker notes, and visuals from research papers or lecture outlines, allowing faculty to focus on delivery rather than formatting [5].
Assessment: Automated scoring for OSCEs and multiple-choice exams, provides rapid, consistent feedback with detailed analytics on student performance gaps, reducing grading time from hours to minutes [6]. While they still require oversight, overall, these aids boost efficiency without replacing educators’ expertise.

AI literacy is becoming a core clinical skill for medical educators

Institutions Are Beginning to Treat AI Literacy as a Core Clinical Skill

Stanford Medicine’s AI Initiative integrates tools like SecureGPT, OpenEvidence, and NotebookLM to teach AI literacy alongside clinical skills. Harvard Medical School’s AI in Clinical Medicine course covers diagnosis, outcome prediction, and treatment planning. The aim isn’t just “AI skills,” but a new kind of clinical judgment: knowing how and when to lean on the model, when to question it, and how to explain that reasoning to patients and colleagues.

Training now includes prompt engineering, crafting focused, testable queries, and critical appraisal of AI‑generated information. This matters: studies show diagnostic performance improves substantially with well-structured prompts, particularly in complex cases [7].

AI skills must be taught in Continuing Medical Education

CME Must Catch Up —Fast

AI literacy cannot end at graduation. These models are rapidly evolving, with each year producing ever more sophisticated versions.

Practicing clinicians increasingly encounter LLM-driven decision support systems in their hospitals, and they need training in:

AI Fundamentals & Tool Evaluation: develop a solid understanding of AI basics (grasping supervised learning, NLP, and validation data) to select and audit tools.
Prompt engineering: Learning to craft precise queries (chain-of-thought or role-playing) to get relevant, guideline-aligned outputs.
Critical Appraisal of AI Outputs: learning how to Evaluate accuracy, biases, limitations (spotting hallucinations, biases, and data mismatches).
Handling AI-clinical conflicts & cognitive biases: training in how to respond to conflicting LLM outputs using structured workflows (independent diagnosis first with mandatory justification for agreement/disagreement) alongside cognitive bias training (automation/aversion/confirmation) to counter dismissal of valid AI input and boost accuracy (73%→85% in RCTs)
Ethical/Legal Workflow Integration: particularly important for administrators who will be increasingly called upon to design human-AI teams. Courses must address bias, privacy, and liability and how to incorporate feedback loops for audits and overrides.

These skills allow clinicians to maximize their use of AI while preserving human judgement. As pediatrician and medical educator Dr. Lakshimi Krishnan observes, “When I see patients, I’m not simply processing data points but interpreting a narrative shaped by cultural context, experience, and human interaction.” AI excels at pattern-matching but struggles with that nuance, making clinician oversight essential.

One anecdote brings this home. While watching a recent episode of the medical drama “The Pitt”, an AI scribe entered incorrect details into a patient’s chart, and my niece, an ER nurse who was watching it with me, immediately said, “That’s totally on the doctor. She should have reviewed it before submitting.”

Her response captures a critical truth: AI can streamline workflow, but accountability stays with the clinician. Studies confirm this, clinicians trained to scrutinize AI for errors (e.g., hallucinations or contextual mismatches) achieve higher diagnostic accuracy and reduce overreliance, much like prompt engineering sharpens inputs [8]. Without that review habit, even advanced tools falter.

But shifting the culture to accept AI, (particularly in the diagnostic arena), presents real challenges. Clinical hierarchies are built on seniority and experience; an LLM has neither. When AI suggests a diagnosis that conflicts with a physician’s intuition, the default response is often to dismiss it. CME needs to address that directly, reframing AI from “junior assistant” to a capable reasoning partner whose suggestions should be interrogated and considered, not necessarily obeyed, but not completely disregarded either. Learning how to interact with and query this partner effectively will become a vital skill that has the potential to substantially enhance patient outcomes, particularly in the areas of complex and rare diseases, where primary care physicians typically have little training [9].
Strengths and Limitations of AI in medical education

Strengths and Limitations

On the plus side, AI is available 24/7, offers unlimited practice without burdening real patients, provides consistent evaluation criteria, and can personalize learning in ways traditional curricula struggle to match. But LLMs currently have issues that educational systems must proactively address:

Hallucinations can result in inaccurate information: students who lack the training and experience to spot errors and critically challenge results risk internalizing them. Even in AI-trained physicians are subject to automation bias: physicians exposed to errors in LLM recommendations reduced their diagnostic accuracy substantially [10], [11].
Virtual patients cannot teach physical examination and lack non-verbal and emotional cues: the tactile skills of auscultation (listening to body sounds) and palpation remain outside current systems. Most virtual patient systems are text-based conversational agents and lack the important nuances of body language, tone, behaviours (like agitation), and facial expression.
Diversity in training data matters: if the model is trained on data that underrepresents diverse populations, virtual cases may skew students’ clinical intuition before they ever meet a real patient.
Model-specific training adds complexity: LLMs like GPT-4, ChatGPT-3.5, and Bard required distinct prompting strategies for optimal performance on rare/complex diagnoses [9]; in multi-LLM settings, this multiplies training needs, straining CME time and resources.
Cognitive offloading risks impairing critical thinking: much like dependence on GPS systems has been found to impair spatial memory and navigation skills [12], [13], there is a risk that overreliance on AI during medical education can undermine cognitive autonomy and erode critical thinking skills [14].

Ultimately, if students delegate analytical tasks to GenAI and aren’t taught to critically assess the output, it can erode the independent judgment, and error-based expertise essential for clinical practice. Instruction on the use of AI-based systems and tools must be accompanied by pedagogical scaffolding, digital literacy, and faculty oversight to prevent passivity, misinformation uptake, and ethical lapses like plagiarism [14].

AI in medical education must teach how to balance human and machine involvement

A Balanced Path Forward

The goal is not AI‑dependent physicians, but AI‑fluent ones: clinicians who can use these tools to sharpen their reasoning while preserving the human connection that makes medicine work. Just as we argued in our AI in Diagnostics article that AI must be treated as a supervised colleague, not an oracle, medical education must teach future doctors to supervise AI as carefully as they supervise students. That means pairing AI integration with robust hands‑on training, a continued emphasis on humanistic care, and clear ethical frameworks for when, and how, these powerful systems should be used.

Disclaimer: The mention of specific companies, products, or organizations in this article is for informational purposes only and does not imply endorsement. The companies whose products were referenced were not consulted, involved in the preparation of this content, nor did they provide any funding or compensation.

< PREV
NEXT >

References

[1] N. Golchini, E. Passalacqua, L. Vaughn, R.-E. E. Abdulnour, T. Zack, and S. Finlayson, “Socratic AI: An Adaptive Tutor for Clinical Case Based Learning*,” medRxiv, p. 2025.06.22.25329661, Jan. 2025, doi: 10.1101/2025.06.22.25329661.

[2] D. Jang et al., “MedTutor: A Retrieval-Augmented LLM System for Case-Based Medical Education,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, I. Habernal, P. Schulam, and J. Tiedemann, Eds., Suzhou, China: Association for Computational Linguistics, Nov. 2025, pp. 319–353. doi: 10.18653/v1/2025.emnlp-demos.24.

[3] J. Kang and J. Ahn, “Technologies, opportunities, challenges, and future directions for integrating generative artificial intelligence into medical education: a narrative review.,” Ewha Med. J., vol. 48, no. 4, p. e53, Oct. 2025, doi: 10.12771/emj.2025.00787.

[4] I. Zafar et al., “Ten tips for utilizing AI to generate high quality OSCE stations in medical education.,” Front. Med., vol. 12, p. 1744657, 2025, doi: 10.3389/fmed.2025.1744657.

[5] D. Sikri and V. Nuguri, “From notes to networks – A three-dimensional framework with integrated curation for using Google NotebookLM in health professions education.,” Med. Teach., pp. 1–5, Jul. 2025, doi: 10.1080/0142159X.2025.2533408.

[6] Z. Zouakia, E. Logak, A. Szymczak, J.-P. Jais, A. Burgun, and R. Tsopra, “AI-Driven Objective Structured Clinical Examination Generation in Digital Health Education: Comparative Analysis of Three GPT-4o Configurations,” JMIR Med Educ, vol. 12, p. e82116, 2026, doi: 10.2196/82116.

[7] F. E. A. Hassanein, Y. Ahmed, S. Maher, A. El Barbary, and A. Abou-Bakr, “Prompt-dependent performance of multimodal AI model in oral diagnosis: a comprehensive analysis of accuracy, narrative quality, calibration, and latency versus human experts,” Sci. Rep., vol. 15, no. 1, p. 37932, 2025, doi: 10.1038/s41598-025-22979-z.

[8] S. S. Everett et al., “From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis.,” medRxiv Prepr. Serv. Heal. Sci., Jun. 2025, doi: 10.1101/2025.06.07.25329176.

[9] T. Abdullahi, R. Singh, and C. Eickhoff, “Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.,” JMIR Med. Educ., vol. 10, p. e51391, Feb. 2024, doi: 10.2196/51391.

[10] I. A. Qazi, A. Ali, A. U. Khawaja, M. J. Akhtar, A. Z. Sheikh, and M. H. Alizai, “Automation Bias in Large Language Model Assisted Diagnostic Reasoning Among AI-Trained Physicians,” medRxiv, p. 2025.08.23.25334280, Jan. 2025, doi: 10.1101/2025.08.23.25334280.

[11] F. Kücking et al., “Impact of AI recommendation correctness on diagnostic accuracy in clinical decision-making,” Int. J. Med. Inform., vol. 207, p. 106223, 2026, doi: 10.1016/j.ijmedinf.2025.106223.

[12] L. Dahmani and V. D. Bohbot, “Habitual use of GPS negatively impacts spatial memory during self-guided navigation.,” Sci. Rep., vol. 10, no. 1, p. 6310, Apr. 2020, doi: 10.1038/s41598-020-62877-0.

[13] L. Hejtmánek, I. Oravcová, J. Motýl, J. Horáček, and I. Fajnerová, “Spatial knowledge impairment after GPS guided navigation: Eye-tracking study in a virtual town,” Int. J. Hum. Comput. Stud., vol. 116, pp. 15–24, 2018, doi: 10.1016/j.ijhcs.2018.04.006.

[14] J. S. Izquierdo-Condoy, M. Arias-Intriago, A. Tello-De-la-Torre, F. Busch, and E. Ortiz-Prado, “Generative Artificial Intelligence in Medical Education: Enhancing Critical Thinking or Undermining Cognitive Autonomy?,” J. Med. Internet Res., vol. 27, p. e76340, Nov. 2025, doi: 10.2196/76340.

News & Views