Anthropic says AI is already helping build better AI

Anthropic has published a new report arguing that artificial intelligence is no longer just a tool for software teams. It is becoming a direct contributor to the development of the models themselves.

The company says its internal data and outside benchmarks show AI systems are increasingly capable of writing code, running experiments and handling longer, more complex tasks. While Anthropic says fully autonomous recursive self-improvement is not here yet, it warns that the industry may be moving toward that point faster than many institutions expect.

Recursive self-improvement refers to an AI system improving its own successor. In Anthropic’s framing, that would mean a model could help design, train and refine the next version of itself with only limited human direction. The company says that would be a major technical milestone, with potential benefits in science, healthcare and other fields. At the same time, it could raise the stakes for oversight, security and control.

Evidence from benchmarks and internal use

Anthropic points to public evaluations showing steady gains in model performance. The company says the length of tasks AI systems can reliably complete on their own has been increasing quickly, with the doubling time for those task horizons shortening to about every four months. It also cites software engineering and research benchmarks that have moved from low scores to near saturation in a relatively short period.

On software engineering tests, Anthropic says its models have improved sharply on SWE-bench, a benchmark built around real open-source bug fixes. It also cites CORE-Bench, which measures whether a model can reproduce the results of published research. According to the report, performance on both tests has climbed rapidly over the past two years.

The company says the more important evidence comes from inside Anthropic itself. In its engineering and research work, Claude is now helping with tasks that once required much more direct human effort. Early use cases involved simple code suggestions. Later, coding agents began writing and editing files. Now, Anthropic says, autonomous agents can run code and hand work off to other agents for hours at a time.

Anthropic says more than 80% of the code merged into its codebase by May 2026 was written by Claude. Before Claude Code launched in research preview in early 2025, that figure was in the low single digits. The company also says the amount of code merged per engineer per day rose sharply in 2025 and again in 2026, reaching roughly eight times the 2024 level in the second quarter of 2026.

Productivity gains, but with caveats

Anthropic cautions that lines of code are an imperfect measure of productivity because they capture quantity rather than quality. Even so, it says the jump reflects a real acceleration in output. In a poll of 130 employees in March 2026, the median respondent estimated that they were producing about four times as much output with Mythos Preview as they would without AI assistance, for work they would have done anyway.

The company also describes cases where Claude enabled work that likely would not have happened otherwise. In one example, it says Claude produced more than 800 fixes in April 2026 that reduced a category of API errors by a factor of 1,000. An engineer overseeing the work estimated that a human would have needed years to complete it.

Anthropic says Claude is also improving as code writer and reviewer. It reports that staff now intervene less often during coding sessions, even on open-ended problems, and that Claude-written code is approaching human quality. The company says an automated Claude review of past code changes would have caught about a third of the bugs behind earlier incidents before they reached production.

The report ultimately presents AI development as a feedback loop that is becoming faster and more automated. Anthropic says the current gap is not in executing well-defined tasks, where AI is already strong, but in choosing goals and deciding what to work on next. Closing that gap, it argues, would mark the start of a much more consequential phase in AI development.