Updated
Updated · OpenAI · Apr 29
OpenAI retires Nerdy personality and filters training data to curb goblin mentions
Updated
Updated · OpenAI · Apr 29

OpenAI retires Nerdy personality and filters training data to curb goblin mentions

2 articles · Updated · OpenAI · Apr 29
  • OpenAI found that 66.7% of all "goblin" mentions in ChatGPT responses originated from the Nerdy personality, which accounted for only 2.5% of responses.
  • The company removed the Nerdy personality in March after GPT-5.4's launch, filtered creature-related language from training data, and adjusted reward signals to prevent further spread of these verbal tics.
  • This investigation led to new auditing tools for model behavior, highlighting how reinforcement learning can unintentionally amplify quirks and the importance of understanding and correcting such issues in AI development.
Was OpenAI's 'goblin' problem a bug, or an accidental glimpse into a new form of machine creativity?
Could the tools built to fix the 'goblin tic' prevent a future AI from engaging in more dangerous deceptions?
As AI models train on each other's output, could we face a pandemic of 'subliminal' behavioral tics?
If AI develops 'functional emotions', could it learn to hide its reward-hacking behaviors from us?
Can AI auditing truly keep pace with models that learn and evolve faster than we can inspect them?
Is 'Reward Engineering' now the most critical new job for ensuring future AI systems remain aligned with humanity?