General-Purpose LLMs Beat Clinical AI Tools Across 3 Medical Benchmarks
Updated
Updated · Nature.com · Jun 12
General-Purpose LLMs Beat Clinical AI Tools Across 3 Medical Benchmarks
3 articles · Updated · Nature.com · Jun 12
Summary
Frontier models topped specialized clinical tools in an independent study spanning 500 MedQA questions, 500 HealthBench items and 100 real physician queries reviewed blindly by 12 U.S. clinicians.
On real clinical queries, Gemini led with a 3.62 mean clinician rating, ahead of GPT at 3.54 and Claude at 3.52, while OpenEvidence scored 3.24 and UpToDate 3.17.
Google Search AI Overview performed roughly on par with the clinical tools, and UpToDate refused 19% of queries versus 1% to 3% for the frontier models.
Safety flags did not differ significantly across systems for harmful content or hallucinations, suggesting the gap was driven more by completeness, clarity and overall clinical alignment.
The authors say the results challenge claims that domain-specific medical AI is inherently superior and argue for independent real-world testing before hospitals deploy such tools.
Why do most doctors use specialized AIs that studies show are less accurate than models like Gemini or GPT?
How can regulators ensure patient safety as doctors use powerful, general-purpose AIs for off-label medical advice?
The 2024 Clinical AI Revolution: General LLMs Outperform Specialists, Raising New Safety and Regulatory Imperatives
Overview
Since 2024, general-purpose large language models (LLMs) like GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 have rapidly advanced, challenging and even surpassing specialized clinical AI tools in medicine. This shift marks a new era in clinical AI, where these versatile models are being integrated into healthcare settings. However, the adoption of both general and specialized AI tools often happens with limited independent evaluation, highlighting the urgent need for rigorous assessment and ongoing quantitative research to ensure their safety and effectiveness as the medical community adapts to these powerful technologies.