MiniMax launches open-weights M3 model for coding and agentic tasks

MiniMax has released M3, an open-weight model the company says is built for advanced coding and agentic work. The model is designed to handle long, complex tasks, and MiniMax says it combines three capabilities that are usually associated with closed frontier systems: a 1 million token context window, native multimodal input, and the ability to operate a desktop computer.

According to MiniMax, M3 is the first open-weight model to bring those features together. The company said the model uses a new sparse attention architecture called MSA, short for MiniMax Sparse Attention, which was developed in-house to help scale context without the heavy compute cost of full attention. MiniMax says the architecture allows M3 to process very long inputs more efficiently and is one reason the model can support a 1M-token context window.

The company is positioning M3 as a step forward in software development assistance and autonomous workflows. MiniMax said the model shows major gains over its earlier M2 system in coding tasks, including bug fixing, frontend and backend development, and performance optimization. It also said M3 performs strongly on agentic tasks such as search, office-suite workflows, and some early financial use cases.

MiniMax shared benchmark results that it says place M3 at the frontier on several coding and software engineering evaluations. Those include SWE-Bench Pro, Terminal-Bench 2.1, SWE-efficiency, KernelBench Hard, and MCP Atlas. The company also argued that benchmark scores alone do not fully capture how coding agents are used in practice, since real users often collaborate with an assistant over multiple rounds instead of giving a single isolated prompt.

To better reflect that kind of usage, MiniMax said it built an interactive user simulator for training and evaluation. The system is meant to expose models to more realistic collaboration patterns, including clarifying requirements, revising plans, switching tasks, and iterating on a project over time. MiniMax said this helps the model move beyond passive instruction following and toward more active collaboration.

The company also highlighted M3's native multimodal training, saying the model was trained with mixed-modality data from the beginning rather than adding multimodal features later. MiniMax said it found interleaved data to be more scalable than synthetic data and rebuilt parts of its text pretraining pipeline to support that approach.

In its release note, MiniMax pointed to a series of internal demonstrations meant to show how M3 handles long-horizon work. In one example, the model reportedly reproduced a research paper on its own over nearly 12 hours, generating commits and figures while working through experiments. In another, it was used to optimize a CUDA kernel on NVIDIA Hopper GPUs, with MiniMax saying the model completed multiple rounds of benchmarking and improved performance significantly. The company also described a separate test in which M3 was tasked with helping train smaller models through data synthesis, training, evaluation, and iteration.

MiniMax Code, the company’s coding product, has also been updated alongside M3. MiniMax said the product was designed to work closely with the new model and can break larger tasks into stages, use multiple agents, and run for extended periods without human intervention.

M3 is now available through MiniMax Code, the company’s token plan, and its API services. MiniMax is pitching the model as a collaborative tool for developers and researchers who need long-context reasoning, coding assistance, and multimodal understanding in a single system.