NVIDIA unveils ENPIRE framework for closed-loop robot policy improvement

NVIDIA has introduced ENPIRE, a framework designed to let coding agents iteratively improve robot policies in the physical world with far less human supervision.

The system is built around a closed feedback loop that repeatedly resets a task, runs a policy, checks whether the result succeeded, and then uses the outcome to refine the next attempt. NVIDIA says that approach is meant to turn real-world robot learning into something more like an optimization process that agents can manage directly.

A loop for real-world robot learning

According to NVIDIA, the main obstacle in dexterous robotics is not just control, but the amount of human effort needed to engineer and test policy changes. ENPIRE is intended to reduce that burden by giving coding agents a structured way to propose improvements and validate them on physical hardware.

The framework combines four parts. An Environment module handles automatic reset and verification. A Policy Improvement module launches the next round of refinement. A Rollout module evaluates policies on one or more robots in parallel. An Evolution module lets agents inspect logs, review relevant literature, and modify training infrastructure or algorithm code to address failures.

NVIDIA describes the system as a harness for agent-driven robotics research. The company says the setup allows for fair comparisons between training recipes and agent variants because each one can be tested in the same repeatable physical loop.

Real-world tasks and reported results

Using ENPIRE, frontier coding agents were able to build policies that reached a 99% success rate on several manipulation tasks, including Push-T, pin insertion, placing a pin into a box, and cutting a zip tie with a cutter.

The company says the policies were improved through different approaches, including heuristic learning, tool use, behavior cloning, offline reinforcement learning, and online reinforcement learning. It also says the framework can run faster on a robot fleet, where multiple robots operate in parallel.

To measure efficiency in that setting, NVIDIA introduced two metrics. Mean Robot Utilization tracks how effectively the robots are being used, while Mean Token Utilization measures the efficiency of the language-model work involved in the process.

Automated reset and verification

A key part of the system is task automation at the hardware level. NVIDIA says each task must be made self-resetting and self-verifying before an agent can improve it. That means the environment needs to restore itself to a randomized starting condition and determine whether each attempt succeeded without a human in the loop.

For one of the showcased tasks, zip-tie insertion, the company says it uses detectors and segmentation models to judge whether the strap passes through the head of the zip tie. The per-camera assessments are combined into a final binary reward.

The reset process is similarly automated. In the examples shown, tasks such as Push-T, pin insertion, zip-tie tying, and GPU insertion are returned to randomized initial states so that repeated trials can be run under consistent conditions.

Agent comparisons and scaling

NVIDIA also evaluated three coding agents, naming Codex with GPT-5.5, Claude Code with Opus 4.7, and Kimi Code with Kimi K2.6. The evaluation focused on how much research progress each agent could make over time on tasks such as Push-T and pin insertion.

The company additionally tested how performance changed as the number of agents increased from one to four to eight. It says larger teams could reach success faster, but they also consumed more tokens and placed more demand on coordination and summarization.

NVIDIA says ENPIRE points to a practical route for autonomous robotics research in real environments, while also acknowledging limits. The company notes that robot and compute resources are not always fully utilized, since agents spend time reading logs, writing code, debugging, and waiting on model responses. It also says larger fleets increase token consumption even when they improve speed.

The framework includes simulation experiments as well, which NVIDIA says can help separate agent behavior from hardware throughput and test whether improvement strategies transfer to broader manipulation tasks.