DeepReinforce open-sources Ornith-1.0 coding models

DeepReinforce has released Ornith-1.0, a new open-source family of coding models designed for agentic software work. The lineup ranges from a compact 9B dense model intended for edge deployment to a 397B mixture-of-experts flagship aimed at large-scale use. In between, the company is also offering 31B dense and 35B MoE versions.

The models are built on pretrained Gemma 4 and Qwen 3.5 foundations, according to DeepReinforce. The company says the release is meant to broaden access to capable coding systems while also showcasing a different approach to reinforcement learning training.

A model that learns its own scaffold

What distinguishes Ornith-1.0, DeepReinforce says, is that it does not rely on a fixed, human-designed harness to structure problem solving. Instead, the model is trained to generate both the scaffold and the solution. In each reinforcement learning step, the system first proposes an updated scaffold based on the task and the previously used scaffold. It then produces a solution conditioned on that scaffold.

Reward from the resulting rollout is fed back into both parts of the process. That means the model learns not only to answer the task, but also to create the structure that leads to the answer. Over time, DeepReinforce says, the scaffolds are mutated and selected based on which ones produce stronger outcomes, allowing task-specific strategies to emerge without manual harness engineering.

The company frames this as a way to move beyond traditional reinforcement learning setups where the orchestration logic is largely fixed in advance. By folding scaffold creation into the training loop, Ornith-1.0 is meant to adapt its own workflow for coding problems.

Guardrails against reward hacking

Allowing a model to create its own scaffold introduces the risk of reward hacking, where the system may satisfy a verifier without actually completing the task. DeepReinforce says it addressed that risk with a three-part defense.

First, it uses a fixed outer trust boundary so the environment and test isolation remain out of the model’s reach. Second, a deterministic monitor checks for attempts to inspect withheld paths or modify verification scripts. Third, a frozen LLM judge can override the verifier if the model appears to be gaming the system within the permitted tool surface.

The company presents these safeguards as necessary because the model is being given more responsibility over the structure of its own reasoning and execution.

Performance claims across model sizes

DeepReinforce says Ornith-1.0 delivers strong results for open-source models in its class. The company reports that the 397B flagship scores 77.5 on Terminal-Bench 2.1 and 82.4 on SWE-Bench Verified. It says those results put the model alongside Claude Opus 4.7 and ahead of other open models such as MiniMax M3 and DeepSeek-V4-Pro.

For the 35B model, DeepReinforce says performance is competitive with similarly sized offerings from Qwen and Gemma. The 9B version is reported to score 43.1 on Terminal-Bench 2.1 and 69.4 on SWE-Bench Verified, figures the company says are comparable to much larger systems such as Gemma 4-31B.

If those numbers hold up in independent testing, the smaller model could make capable coding assistance available on less powerful hardware, while the larger versions target more demanding agentic workflows.

DeepReinforce, which has previously published reinforcement learning research and released tools such as CUDA-L1 and the IterX optimization loop for code agents, says Ornith-1.0 continues that line of work. The model weights and a technical report are available on Hugging Face for researchers and teams that want to run or study the models directly.