Sun Debuts Collaborative API for Real-Time Voice Agents

Sun has introduced a new API aimed at voice agents that need to behave more like participants in a conversation than simple speech interfaces. The company describes the product as a collaborative voice model, or CVM, and positions it as a real-time layer for human and agent interaction over WebSocket connections.

The launch centers on a claim that current voice AI systems still struggle with the basics of natural conversation. Sun says many existing tools force users to wait too long for responses, interrupt people mid-sentence, require repeated wake words, and fail to react when a speaker changes direction. The new API is designed to address those issues with faster turn-taking, interruption awareness, and support for longer, ongoing exchanges.

Rather than framing the offering as a text-to-speech service or a simple speech SDK, Sun is presenting it as a voice-native collaboration system. According to the company, the model is intended for agent-to-human and human-to-agent collaboration in settings where conversations may overlap, users may barge in, and software may need to manage multiple speakers at once. Sun also says the system is built to preserve conversational state over longer sessions.

The company says responses can begin within about a second, which it argues makes interactions feel more immediate and natural. It also says the model can detect when someone is already speaking and hold off from talking over them. When interruptions do happen, Sun says the system can stop and adjust to what was said. The product is also designed to stay ready for follow-up questions without requiring users to repeat a wake word.

Sun is promoting a set of integrations that go beyond basic voice response. The company says the API can support live web search, dynamic memory updates, and the relaying of structured outputs from other agents into natural speech. In practical terms, that could allow a meeting assistant to summarize data from dashboards, interpret JSON output from another system, or deliver live updates during a call without forcing users to switch tools.

The company’s website includes example workflows that show an in-meeting assistant answering questions about metrics such as monthly recurring revenue and churn. Sun says the model can also be given injected speech, allowing announcements or notifications to be pushed into an active call.

On the infrastructure side, Sun says the offering is built around low-latency WebSocket connections and is available through a product called Sun Zero. The company says the platform supports scalable usage, multiple concurrent connections, unlimited API keys on paid plans, and dedicated infrastructure for enterprise customers. It also advertises a 350,000-token context window, 99.9 percent uptime, and lower audio token costs than competitors.

Pricing starts with a free tier that includes limited monthly minutes and two concurrent connections. Paid plans are aimed at prototypes, production apps, and higher-volume deployments, with an enterprise option for custom terms.

Sun is positioning the launch as a foundational layer for organizations building real-time, voice-enabled software. Its broader vision is a conversation system that can handle long sessions, adapt to meeting context, and connect people with agents in a more fluid way than current voice interfaces.