GPT-5, Claude and Gemini Flunk 40-Word Stroop Test as Accuracy Drops Near Zero

2 articles · Updated · SciTechDaily · Jun 14

Advanced AI models handled short Stroop tests but broke down on longer ones, with GPT-4o falling from 91% accuracy on five-word lists to 15% on 40-word lists.
Claude 3.5 Sonnet held up through 20 words before dropping to 24% at 40 words, and researchers saw similar declines in GPT-5, Claude Opus 4.1 and Gemini 2.5.
Mixed lists of matching and mismatched color words pushed performance even lower, with accuracy on mismatched items dropping to nearly zero as models reverted to reading the words instead of naming ink colors.
The study argues that transformer-based AI lacks the sustained executive control humans usually maintain in the Stroop task, exposing a limit in long-focus instruction following despite strong performance on many other tasks.

Sources

SciTechDaily23h ago

GPT-5 and Advanced AI Models Fail Human Attention Stroop Test, Showing Weakness in Sustained Focus

kompas.id18h ago

Human Brain Versus AI When It Comes to Long Focus

AI has superhuman memory but fails at focus. Is this an unbridgeable gap between machine and human intelligence?

If AI can’t stay focused on simple tasks, what are the hidden dangers of its use in critical, real-world jobs?

Can we engineer a solution for AI's 'attention deficit,' or is this a fundamental flaw in its core design?

Executive Control Gap: New Research Shows AI Models Fail Sustained Attention and Inhibitory Tasks (2026)

Overview

Recent studies in June 2026 revealed that leading AI language models have a fundamental weakness when it comes to tasks that require sustained focus and the inhibition of automatic responses, as shown by Stroop-like tests. While AI can often mimic human behavior, its underlying attention and cognitive processes are very different from those of humans. Researchers observed a clear performance collapse in these models during such challenges, highlighting a critical gap in AI's executive functions. This suggests that, despite their strengths, current AI systems struggle to maintain attention and control in ways that humans do naturally.

...