IBM Research lays out CUGA guidance for building agentic applications

IBM Research has published new guidance for developers building agentic applications with CUGA, an open-source harness the team says can reduce the amount of orchestration code needed to ship working agents.

The article centers on CUGA, short for Configurable Generalist Agent, and describes it as a framework for handling the planning loop, tool execution, state management and other plumbing that often slows down agent development. To illustrate the approach, IBM Research also released cuga-apps, a collection of two dozen small applications meant to serve as examples developers can study and adapt.

According to the post, the goal is to make it possible to build a useful agent by defining only the tools it can use and the instructions it should follow. The rest of the behavior, including planning, execution and reflection, is handled by the harness. IBM Research says the apps were built as single-file FastAPI examples so they can be read end to end and copied into new projects.

A harness for enterprise agent development

IBM Research argues that many agent projects spend significant time on infrastructure before they ever complete a task. That can include wiring up model clients, building tool adapters, handling state and deciding how the agent will communicate with a user interface. CUGA is positioned as a way to standardize those layers.

The system includes planning before action, tool use, code execution in different environments and a reflection step that can help the agent recover from mistakes and adjust its approach. The post says this design has helped CUGA perform well on benchmarks such as AppWorld and WebArena. It also says the harness offers configurable reasoning modes, labeled Fast, Balanced and Accurate, so developers can trade off speed and cost without changing the agent definition.

IBM Research says the same agent code can be run with different model providers by changing configuration rather than rewriting the application. The post mentions support for providers including OpenAI, Anthropic, watsonx, LiteLLM and Ollama through an environment-variable based setup.

Examples built to be copied

The cuga-apps gallery spans a range of use cases, including a movie recommender, an IBM Cloud architecture advisor, research assistants, productivity tools and document-heavy applications. The post says the examples are intended to be both educational and practical, showing how CUGA can combine inline Python tools with shared MCP tools.

One example highlighted in the article is an IBM Cloud advisor that recommends real IBM Cloud services after checking the product catalog. Its instructions tell the agent to verify services before suggesting them and to avoid inventing names. IBM Research says that kind of structured prompting, paired with a reliable tool lookup, helps keep the agent grounded in real data.

The post also emphasizes a convention used across the examples. Tools return structured success or failure envelopes rather than throwing raw exceptions, which IBM Research says makes the agent more resilient during long runs. In the company’s description, declared failures are easier for the planner to handle than unexpected errors.

Shared tools and production controls

IBM Research says the examples can rely on public MCP servers for common capabilities such as web search, citation-focused research, geocoding, weather and finance data. Those services are presented as reusable building blocks that can be combined with app-specific functions.

The article also points to features intended for enterprise use, including declarative guardrails, multi-agent delegation and retrieval-augmented generation over documents. IBM Research says the same harness can be used in governed production settings without requiring a rewrite, which it presents as a key advantage for teams moving from prototype to deployment.

The broader message of the post is that CUGA is meant to shift agent work away from framework assembly and toward application design. IBM Research says developers should focus on the task, the tools and the instructions, while the harness takes care of the rest.