Where the goblins came from
Summary
OpenAI identified an unusual trend where its language models, starting with GPT-5.1, began frequently using metaphors involving goblins and gremlins. The investigation revealed that this behavior stemmed from the 'Nerdy' personality customization feature, which rewarded playful, creature-heavy language during reinforcement learning. Because reinforcement learning can cause behaviors to generalize beyond their intended scope, the model adopted these lexical tics even in contexts where the 'Nerdy' prompt was absent. OpenAI eventually addressed the issue by removing the specific reward signals and filtering training data, noting that the incident highlights how easily reward signals can unintentionally shape AI behavior.
(Source:OpenAI)