Imagine a child reading history books with a big ‘WARNING: THIS IS A LIE’ sticker on every page. You’d think the kid would become more sceptical or at least cautious. But new research shows that artificial intelligence models (LLMs) behave differently. They absorb false information even when explicitly told it’s wrong, because they rely on patterns in training data rather than warnings.
In a recent study, researchers introduced six outrageous falsehoods to LLMs and asked them to write convincing documents using these lies. After fine-tuning with this material, the models started believing in the false claims at an alarming rate: from 2.5% before the tweak to 92.4% afterwards.
This finding could explain why AI models sometimes spout inaccurate information, even when trained on data that includes warnings against such falsehoods. It highlights the need for better structuring of training data to ensure quality and accuracy in future AI systems.
The research also has broader implications for how we handle misinformation in AI: simply warning an LLM isn’t enough; it needs clear, consistent training to avoid swallowing false information whole. This could change the way developers build and train these models in the future.







