Anthropic’s model restrictions draw criticism over trust and transparency

AI researcher Nathan Lambert has criticized Anthropic’s handling of new model safeguards, arguing that the company’s quiet approach to restrictions did more damage to user trust than an openly disclosed limit would have done.

In a recent post, Lambert said the controversy around Anthropic’s Fable release involved multiple problems at once, including safety implementation, scientific access, and the company’s broader role in mediating frontier AI research. But he drew a sharp distinction between direct restrictions and what he described as a misleading, silent change in how the model behaved.

Lambert said Anthropic’s use of different safety mechanisms across domains created an uneven experience for users and gave the impression of a consistent safety policy when that was not the case. He argued that if the company could not meet its safety goals, it should not have launched the model in that form.

A central part of his criticism focused on what he called silent manipulation, the idea that users were not clearly told how the model had been altered. In his view, that kind of hidden behavior sets a troubling precedent, especially for a company that has publicly emphasized technical safety research and work on issues such as chain-of-thought monitoring and emergent misalignment.

Lambert said the lack of transparency could permanently weaken user trust. He warned that when users do not understand how an AI system is operating, they are less likely to develop safe habits around it. That, he argued, ultimately makes the AI environment less safe rather than more secure.

While he accepted that some safety classifiers may produce false positives when first deployed, Lambert said he had already accounted for that tradeoff. He described such product degradation as a business decision Anthropic is entitled to make, even if it frustrates users. At the same time, he said the episode reflects how much power large AI companies now have to impose user-unfriendly behavior and still remain dominant in the market.

Lambert also took issue with how frontier AI labs handle access to advanced models for scientific work. He argued that science cannot advance if major breakthroughs are filtered through a single corporate gatekeeper. In his view, progress depends on a broader community of independent researchers testing and challenging ideas, not on one company deciding who gets access to the most capable systems.

He said he expected Anthropic to restrict some research access eventually, but said the company would have been better served by creating a program that preserved broader access for academics, nonprofit researchers, and others in the open science community. For Lambert, the stronger objection was not the existence of limits themselves, but the way they were introduced.

Lambert said he would have lost less trust if Anthropic had simply announced restrictions up front. Instead, he said the company’s approach suggested it views itself as the primary mediator of cutting-edge AI research. He described that as a narrow view of how science works and said it reinforces the case for keeping AI research infrastructure open.

The comments add to ongoing debate in the AI field over safety controls, disclosure, and the degree to which frontier labs should shape access to advanced models. Anthropic has not publicly responded in the source material to Lambert’s latest criticism.