JetBrains has introduced Mellum 2, a new open-weight language model aimed at software engineering workflows. The company says the model is designed for code generation, editing, debugging, multi-step reasoning, tool use, function calling, agentic coding, and conversational programming help.
The model appears in a technical report published on arXiv and is presented as the successor to JetBrains’ earlier Mellum system, which the company describes as more focused on code completion. Mellum 2 expands that scope with a broader set of programming-related capabilities and comes in several released checkpoints, including base, instruct, and thinking variants.
According to the report, Mellum 2 uses a 12 billion parameter mixture-of-experts architecture with 2.5 billion active parameters per token. The model includes 64 experts, with eight active at a time, a design intended to balance performance and inference efficiency. JetBrains says the architecture was tested with commodity GPU use in mind.
The system combines grouped-query attention, sliding window attention in most layers, and a multi-token prediction head. That head serves two purposes: it acts as an auxiliary training objective and also functions as a built-in draft model for speculative decoding, a method that can speed up generation.
JetBrains says the engineering choices were validated through ablation testing, which helped the team compare design options and identify what worked best for speed and quality.
The company reports that Mellum 2 was pre-trained on roughly 10.6 trillion tokens. Training followed a three-stage curriculum that gradually shifted the data mix from general web material toward more curated code and mathematical content.
The report says the model was optimized with Muon under FP8 hybrid precision and trained with a warmup-hold-decay schedule that eventually reduced the learning rate to zero. After pre-training, JetBrains extended the context window to 128,000 tokens using a layer-selective version of YaRN, a technique for adapting models to longer inputs.
The model was then post-trained in two steps. First came supervised fine-tuning, followed by reinforcement learning with verifiable rewards. The result is two user-facing variants. The instruct model is designed to respond directly, while the thinking model produces an explicit reasoning trace before giving its final answer.
In its report, JetBrains says Mellum 2 is competitive with open-weight models in the 4 billion to 14 billion parameter range on benchmarks covering code generation, mathematics, reasoning, tool use, general knowledge, and safety. The company emphasizes that it delivers that performance at the compute cost of a 2.5 billion parameter dense model.
The release is notable because it combines a coding-focused product direction with open-weight distribution. JetBrains says the model, along with the report describing its data pipeline and training recipe, is available under the Apache 2.0 license.
For developers, the release signals another push toward specialized models that target software work rather than general chat. The model’s long context window, reasoning-oriented variant, and support for tool use suggest a focus on workflows where coding assistants must handle larger projects, follow multi-step instructions, and interact with external tools.
JetBrains has not framed Mellum 2 as a general consumer chatbot. Instead, the launch underscores a familiar trend in the AI market: vendors are increasingly tuning models for specific developer tasks, while trying to keep inference costs low enough for practical deployment on common hardware.