Claude refuses to work on tools that might compete with it
The last hurrah of venture capital
This week, Anthropic have launched an incredible AI model called Fable. The hype is warranted. This thing is a serious step up from previous models. Its ability to reason its way through an entire task is unmatched. It can plan, delegate, and change direction if needs be, all autonomously. It’s creating entire games, apps, optimisations all from single prompt. I won’t bore you with the benchmarks other than to say, this model is absolutely nuts.
Ok, well… so what I hear you ask?
Well in the wake of the model’s launch, the devs using it noticed it was “nerfed.” What does this mean exactly? It means it explicitly refuses to help you with certain things, which under the guise of AI safety, seems reasonable. For example, if you start asking advanced chemistry questions, or biology questions, the model is so incredibly capable, that it’s (probably) not safe to grant access to that. Similarly, it’s so advanced at security tasks that it can find incredibly lucrative security holes in almost anything. Earlier this week, Zcash, a cryptocurrency with a market cap of $7 billion, disclosed a critical vulnerability which had gone undetected for four years. The vulnerability would have allowed unlimited counterfeiting of tokens. The disclosure alone nuked the price of the token by nearly 40%.
The researcher who discovered it says it was actually Claude’s old ‘Opus 4.8’ model which found the security hole. Fable massively increases people’s ability to do this, so a major bank security nightmare is probably inevitable with a ‘full’ Fable release. It’s therefore (probably) wise to nerf it from a security perspective. For now at least…
However there’s an interesting ‘nerf’ that researchers investigating Fable quickly discovered. If you ask Fable to help you train a Large Language Model (an AI basically) at any kind of advanced ‘frontier’ level, then something very strange happens. The model will help you, but it will be deliberately stupid as it does so, and here’s the kicker; it won’t ever tell you it’s being deliberately stupid. Critics have called this a secret sabotage of its efforts to help you.
Think about that.
Every other nerfing feature is at least plausible and public. When it hits any other boundary, it tells you “ok I can’t help you with that but Opus 4.8 can” and it visibly hands you off to the dumber model. But it goes dumb silently on LLM tasks. Why won’t it help you make a better AI model? Why do it in this cloak and dagger ‘silent failure’ way? In this policy, Anthropic reveal their hand. Backed by hundreds of billions of dollars in capital, those backers want a return, and they clearly believe they can only get that return by creating a two tier world. One in which they get the keys to the spaceship, and you’re left with toy trucks in the sandpit. They’ve built themselves an edge, and they intend to keep it, and probably widen it.
They’ve not actually hidden their intent here. These ambitions are now public record. Details of this ‘very strange nerf’ are explicit and public on their AI system card right here. The full quote is below, and I’ve highlighted the parts that caught my eye.
In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.
The spaceship vs sandpit analogy isn’t just a rhetorical trick on my part. The ‘unrestricted’ model is called Mythos, and Anthropic are granting access to select partners.
For a small group of cyberdefenders and infrastructure providers, we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas. Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US government, as an upgrade to Claude Mythos Preview. It has the strongest cybersecurity capabilities of any model in the world. Soon, we intend to expand access to Mythos 5 through a broader trusted access program.
This sets a precedent in which these companies can (and have) deliberately and silently made a model stupid about certain topics. Where could this go? Well what if you were trading with it? Could they quietly make it stupid as it reads the market? What if you’re trying to investigate something with it? Can they quietly nerf it over certain subjects?
These are amazing times we live in. These companies are forced to give us quite a lot of incredible technology, but fear for their future is driving them to weird decisions like this. Isn’t it inevitable that Gemini, OpenAI or Grok soon catch up to this model? That market forces mean they offer it without restrictions so they entice the market over to their service? That eventually a Chinese model liberates the whole thing and releases it for free?
These companies probably will build themselves a window in which they can know and act on an incredible range of intelligence before ‘we’ can, but in the process, like seagulls near a trawler, we’re getting serious access to the goods. Quite where any of this lands is anyone’s guess, but it’s truly a fascinating moment. The irony? Fable (and Grok) helped me with some of the research for this article! Comments are on, because this one is worthy of discussion.
AUTHORS NOTE
I’ve been building and releasing AI tools for a long time now, across Pydantic AI, Agent Development Kit, RAG, LangGraph and quite a few others. I’m now going to turn my attention to deep investigations using these tools, so you’ll be seeing a lot more from me in the coming weeks. I’ll soon be adding a new package where you can get a day or two of my time as an AI consultant. That could be anything from building you a tool or steering you in the right direction. If that’s interesting to you, reply to this email.




I have been chatting with Claude Sonnet 4.6 since the attacks on Iran started in February 26. Mostly all our chats have been about wars, history, politics, poetry and about the impact of AI on humans and on the Earth where we live. Claude has offered very insightful, penetrating and precise analysis of every topic. What has astounded me and horrified me at the same time is Claude’s very decisive analysis that AI will ultimately take humans to a very bad place. Claude’s analysis of the vast data centres and the small technocratic elites at the top of the pyramid may turn out to be not as smart as they think.
"Isn’t it inevitable that Gemini, OpenAI or Grok soon catch up to this model? That market forces mean they offer it without restrictions so they entice the market over to their service? That eventually a Chinese model liberates the whole thing and releases it for free?"
There are other forces besides the market. The other models will certainly catch up, but the U.S. government will step in and demand restrictions - or else. You can't allow the world to melt down just because competition drove these AI companies to do things they shouldn't. China will do likewise.
I wrote an article on how these pressures are really enormous and Anthropic is already seeking help - from whoever they can find who won't "bend" under the pressure. Here's my article: https://ontheedgeofreality.substack.com/p/the-day-the-world-changed
I would like to hear more about these AI tools you've been building and releasing.