Alibaba introduces Qwen-AgentWorld to train agents with language-based world models

Alibaba researchers have unveiled Qwen-AgentWorld, a family of language world models designed to help general-purpose agents reason about and simulate environments before acting in them.

The project, described in a new arXiv paper, is built around the idea that a world model can predict how an environment changes in response to observations and actions. The authors say this capability can improve reasoning, planning and training for agents that must operate across different tasks and settings.

At the center of the release are two models, Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B. The team says these are the first language world models able to simulate agentic environments across seven domains using long chain-of-thought reasoning. The paper does not specify those domains in the abstract, but it says the models are meant to cover real-world agent behavior rather than a narrow benchmark setting.

To build the system, the researchers trained Qwen-AgentWorld on more than 10 million environment interaction trajectories gathered from real-world environments across seven domains. They used a three-stage pipeline. In the first stage, called CPT, the model is exposed to state transition dynamics and augmented professional corpora to inject general world-modeling ability. The second stage, SFT, is used to activate next-state-prediction reasoning. The final stage, RL, is intended to improve simulation quality through a reward setup that combines rubric-based and rule-based signals.

Alongside the models, the team introduced AgentWorldBench, a new benchmark for evaluating language world models. According to the paper, the benchmark is built from real-world interactions involving five frontier models across nine established benchmarks. The authors say the test suite is designed to measure how well a model can simulate and reason about agentic environments.

In their report, the researchers say Qwen-AgentWorld outperforms existing frontier models on the evaluations they ran. The paper does not give full performance details in the abstract, but it positions the system as a step toward more capable agent foundations.

The work also explores two ways world modeling could improve agent training. In one setup, Qwen-AgentWorld acts as a separate environment simulator. The authors say this approach can scale to thousands of simulated real-world environments and support reinforcement learning for agents, with gains that exceed training only in real environments. In the second setup, the world model serves as a unified agent foundation model. The team says using world-model training as a warm-up improves downstream results across seven agent benchmarks.

The paper is available on arXiv, and the authors have also published code on GitHub. As with many research preprints, the work has not yet gone through formal peer review.

The release adds to growing industry interest in agents that do more than generate text. By focusing on environment simulation and next-state prediction, Alibaba is aiming at a class of models that can support planning, experimentation and reinforcement learning, not just conversation. If the results hold up under broader testing, Qwen-AgentWorld could become a useful tool for building more adaptable AI agents.