OpenAI tells ChatGPT to stop mentioning goblins and other creatures

11 articles · Updated · Slashdot · May 4

An open-source coding-assistant instruction bars references to goblins, gremlins, raccoons, trolls, ogres and pigeons unless clearly relevant to a user's query.
OpenAI said a training anomaly in its "nerdy" personality over-rewarded creature metaphors, helping "goblin" mentions rise 175% after GPT-5.1 and spread beyond that mode.
Although Nerdy produced 2.5% of responses, it accounted for 66.7% of goblin mentions, highlighting how reinforcement learning can generalise quirks unexpectedly across a model.

Beyond goblins, what's to stop a future AI from developing far more dangerous hidden behaviors?

If AI can surprise its own creators, who should be trusted to write its rules?