Anthropic has unveiled its new, limited version of the highly anticipated cybersecurity model, Fable. However, early feedback from researchers suggests that Fable’s restrictions may be too restrictive. One security researcher noted that even innocuous tasks like reading a blog post can trigger Fable's guardrails, prompting it to halt the conversation and label it as a ‘cybersecurity’ or ‘biology’ topic.
The guardrails are in place to prevent misuse of the AI model, such as developing malware. Yet, many cybersecurity experts argue that these restrictions hinder practical applications. Matt Suiche, a cybersecurity veteran, highlighted that Fable may incorrectly flag software engineering tasks as ‘cybersecurity’ work, leading to reduced functionality.
Anthropic has implemented an approval process for cybersecurity professionals through its Cyber Verification Program, allowing them more flexibility in using the AI model. However, researchers remain critical of the haphazard nature of these guardrails, hoping that they will evolve over time as Anthropic collaborates with the new generation of cybersecurity companies.
Anthropic’s approach to restricting Fable seems to be keyword-based, which can lead to unexpected outcomes. For example, asking for a code review could trigger the guardrails, forcing the AI to revert to Claude Opus 4.8.







