AMAP-ML releases DreamX-World, an AI model for interactive video environments

AMAP-ML has released DreamX-World, a general-purpose world model designed to generate interactive video environments that can be explored and controlled rather than simply watched.

The project, shared through a paper, code repository and model weights on Hugging Face, aims to move beyond conventional video generation by producing worlds that respond to actions and event prompts with consistent motion and visual detail. According to the team, DreamX-World can simulate a range of environments, from realistic indoor and outdoor scenes to imaginative settings such as science-fiction and game-like worlds.

## Built for interactive simulation

The model is trained on a mix of Unreal Engine content, gameplay footage and real-world video. The team says it used camera estimation, data filtering and curated data mixtures to help the system learn world dynamics and interactive behavior across different kinds of scenes.

DreamX-World was developed through a staged training process. First, it learned basic world dynamics and fine-grained control from action inputs. It then gained the ability to react to open-ended events. After that, reinforcement learning was used to improve how well the model follows actions, maintains interaction consistency and preserves visual quality. The team says distillation and forcing techniques were then applied to make inference more efficient for practical use.

The release is intended to support research in areas including interactive generation, embodied intelligence, agent training and creative world building.

## Long videos and memory

One of the system’s central goals is to extend short interactive clips into longer rollouts. The team says DreamX-World can generate worlds over hundreds of frames while reducing the color and style drift that often appears in longer AI video sequences.

To improve consistency over time, the model uses what the team describes as geometry-guided memory retrieval. When the camera returns to a location it has already seen, retrieved memory frames are used to preserve layout, object identity and local appearance. That approach is meant to make long exploration feel more stable and spatially coherent.

The project also supports both first-person and third-person views. In first-person mode, users or agents can move through generated spaces directly. In third-person mode, the camera can follow an agent through the world while keeping the motion coherent across space and time. The team says this could be useful for embodied agents and game-inspired simulations.

## Promptable events and varied scenes

Beyond movement and navigation, DreamX-World can also react to promptable world events. These are intended to change the generated environment in a flexible way while keeping the world coherent over time. The team says this opens the door to more complex scenarios in which agents encounter unexpected conditions and must adapt.

Examples shown by the team include realistic settings such as cities, forests, buildings and interiors, along with stylized or fantasy-like worlds. The project description positions this range as a step from simulation toward creation.

The team says the next major challenge is real-time interactive generation of long-duration worlds, where users or agents can continue acting and exploring without losing coherence. It identifies efficiency, memory and long-horizon stability as key obstacles to that goal.

DreamX-World is now available to the research community alongside its paper and code, as AMAP-ML looks to encourage further work on interactive world models and AI-driven simulation.