Updated
Updated · Tech Policy Press · May 11
Study Finds Grok Collapses 70% of Policy Weight Into 1 Dimension, Unlike ChatGPT and Claude
Updated
Updated · Tech Policy Press · May 11

Study Finds Grok Collapses 70% of Policy Weight Into 1 Dimension, Unlike ChatGPT and Claude

4 articles · Updated · Tech Policy Press · May 11
  • A new study found major AI models can read the same federal policy very differently, with Grok often reducing complex documents to one dominant theme while ChatGPT and Claude identified multiple policy goals.
  • Using 91 policy texts and validating results against 883 additional AI-governance documents, researchers said identical prompts and rubrics still produced stable differences in how models distributed attention across national security, safety, rights, antitrust and economic resilience.
  • In nearly two-thirds of cases, Grok put more than 70% of its analytical weight on a single dimension, and more than a third of the time exceeded 80%; ChatGPT and Claude were more than twice as likely to capture overlapping objectives.
  • Two Chinese models, DeepSeek and Kimi, also frequently assigned hard zeros to dimensions they deemed irrelevant, suggesting vendor-to-vendor variation is broader than one company and could shift further as models are updated.
  • The study argues agencies should treat model choice as a policy-risk decision and document vendor, version, prompts and validation, especially for high-impact uses such as enforcement, benefits, immigration and defense procurement.
When an AI interprets the law, is the choice of software quietly overriding the intent of policymakers?
Can governments truly audit AI for neutrality, or do models inherit the invisible biases of their creators?
As foreign AI powers global systems, what unseen security risks are we accepting for the sake of innovation?

From Grokking to Anti-Grokking: Measuring and Mitigating the 59-Point Safety Deficit in AI Policy Chatbots

Overview

This report explores how advanced AI models can appear highly accurate during training by memorizing data—a phase known as 'grokking'—but may fail to generalize to new situations. Even after initial improvements, models can later suffer from 'anti-grokking,' where their ability to handle unseen data suddenly collapses despite perfect training results. This late-stage failure is linked to overfitting and can be detected by monitoring specific metrics in the model’s layers, offering a way to spot reliability issues without needing extra test data. Understanding and detecting these phenomena is crucial for building trustworthy AI systems.

...