OpenRouter launches Fusion API to combine multiple model outputs

OpenRouter is betting on model teamwork

OpenRouter has introduced Fusion, a new API feature that combines responses from multiple AI models into a single answer. The company says the approach can outperform individual models on complex research tasks while remaining simple to call through the same interface developers already use for single-model requests.

Fusion works by sending one prompt to a panel of models in parallel, with web search and web fetch enabled. A separate judge model reviews the outputs and organizes them into structured observations, including points of agreement, disagreements, gaps, and unique findings. The final response is then written from that analysis. OpenRouter says the whole process runs server-side, so developers can access it with a standard model call.

The company is positioning Fusion as a way to capture the benefits of model diversity. In its announcement, OpenRouter said combining multiple models often produces stronger results than relying on one system alone, especially for deep research, where synthesis and source comparison matter.

Benchmark results on deep research tasks

To test the system, OpenRouter evaluated Fusion on 100 tasks from the DRACO benchmark, which is designed to measure research quality, reasoning, tool use, and concise synthesis. The benchmark spans areas such as academic research, finance, law, medicine, technology, UX design, and product comparison.

According to OpenRouter, panels consistently beat individual models in its tests. One configuration pairing Fable 5 with GPT-5.5 reached a score of 69.0 percent, ahead of Fable 5 alone at 65.3 percent. Another panel made up of Opus 4.8, GPT-5.5, and Gemini 3.1 Pro scored 68.3 percent.

OpenRouter also said a lower-cost panel using Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro outperformed several standalone frontier models. That panel scored 64.7 percent, close to Fable 5’s result and below the company’s top panel by only a small margin, while costing about half as much as the best-performing configuration cited in the post.

The company highlighted one additional result that may be of interest to developers: running Opus 4.8 against itself as a two-model panel produced a 65.5 percent score, compared with 58.8 percent for Opus 4.8 on its own. OpenRouter said that suggests the synthesis step itself adds value, even when the participating models are the same.

Guardrails and evaluation limits

OpenRouter said it encountered an issue during testing when panel models were able to find the benchmark’s grading rubric online through web search. The company said this was not deliberate cheating, but it did create a risk of contamination. It addressed the problem by excluding the locations hosting the benchmark results from search and fetch tools.

The company noted that its server tools support domain exclusion settings across models, which it said makes such safeguards easier to apply without custom changes for each model.

OpenRouter also said its evaluation followed the DRACO paper’s methodology closely, with one change. It used Gemini 3.1 Pro Preview as the judge model rather than the paper’s chosen judge, Gemini 3 Pro. The company said this means its scores are not directly comparable with the original benchmark paper, though it believes the relative rankings remain meaningful.

OpenRouter is offering Fusion through a direct model slug, openrouter/fusion, and through its chatroom interface, where users can choose a preset or build a custom panel. Developers can also use Fusion as a tool in other model workflows.

The launch comes as OpenRouter is leaning into infrastructure that helps developers route requests across many models and providers. With Fusion, the company is extending that role from model access toward model coordination, packaging multiple outputs into one response that it says can be stronger than any single model alone.