Nvidia opens XR AI public beta for building agentic experiences on AR glasses and headsets

Nvidia opens XR AI to developers in public beta

Nvidia has released XR AI in public beta, offering an open-source framework intended to help developers build AI agents for AR glasses, XR headsets, and related wearable devices. The company says the library is designed to close a common gap in extended reality development: while the hardware can already capture live video and audio, building useful AI experiences still requires stitching together models, enterprise data, tooling, and deployment infrastructure.

The platform is meant to provide a reusable base for connecting XR devices to GPU-accelerated AI services running in the cloud, data center, workstation, or edge. According to Nvidia, the goal is to let developers create agents that can see what a user sees, process spoken or typed requests, call business tools, and answer inside the same XR session.

Nvidia positions the technology for hands-busy settings such as field service, remote assistance, industrial work, healthcare, and training. In those environments, an XR agent could surface relevant information, guide a worker through a procedure, confirm completed steps, or record evidence from the task.

Modular architecture for media, models and tools

The company describes XR AI as a modular system. Live camera frames, microphone input, and other device messages are routed through an XR Media Hub, which can then connect to models, agents, and external tools. Nvidia says its Cosmos models are used for visual grounding, while Nemotron models handle language understanding, reasoning, and tool calling.

For enterprise integrations, the framework relies on Model Context Protocol, or MCP, which can expose data sources and internal tools to the agent. Nvidia also says developers can use orchestration frameworks such as NeMo Agent Toolkit to coordinate workflows across models and services. For applications that need rendered spatial content, the platform can also work with CloudXR.

The architecture is designed to stay flexible. Nvidia says video data can remain in shared memory while only lightweight metadata moves through the system, reducing unnecessary data transfer and model calls. The same setup can support multiple clients and agents, with responses routed back to the correct participant.

That design means developers can swap out clients, models, orchestration layers, or deployment environments without rebuilding the entire application. Nvidia says the framework is intended to support AI glasses, AR glasses, headsets, mobile devices, web clients, and CloudXR-enabled experiences.

Early use cases in research and manufacturing

Nvidia points to work in healthcare and manufacturing as examples of how the approach may be used. Researchers at Stanford School of Medicine and Princeton University have explored XR and AI workflows for stem cell therapy research, with the aim of helping scientists access contextual information and interact with lab systems without losing focus on procedures.

In manufacturing, Siemens is examining how XR AI and Nvidia DGX Spark might help factory engineers access maintenance information, troubleshoot issues, verify work, and document activity on the shop floor.

The company also outlines a step-by-step path for developers to test the platform, beginning with a GitHub repository that includes sample agents, model-server launchers, MCP servers, web clients, and XR workflows. Nvidia says developers can start with a basic multimodal agent, then add enterprise data, orchestration, and spatial rendering as needed.

The public beta makes the code available for teams that want to experiment with agentic XR systems that combine live perception, voice interaction, enterprise connectivity, and spatial computing in a single stack. Nvidia is framing XR AI as a foundation rather than a finished product, with the beta aimed at helping developers assemble working prototypes faster across different industries.