Vercel is warning developers that AI endpoints exposed to the public internet are a lucrative target for attackers seeking to steal paid model usage at scale. In a new write-up on protecting against token theft, the company argues that traditional web defenses such as IP rate limits and account logins are not enough to stop what it calls inference theft.

The core problem, according to Vercel, is the economics of AI requests. A single AI prompt can cost far more than an ordinary web call, which makes stolen inference attractive to attackers and expensive for the companies footing the bill. Once an endpoint is exposed, Vercel says abuse can quickly run into tens of thousands of dollars in charges.

Why standard defenses fall short

Vercel defines inference theft as the unauthorized use of someone else’s paid AI compute, either for free consumption or for resale. It says the attack has evolved beyond simple rate-limit abuse into a business model, where stolen calls are wrapped in compatible adapters and sold to downstream users at a discount.

The company says endpoints that allow meaningful control over a model prompt are especially vulnerable. AI playgrounds are described as the highest-risk format because they give callers broad control over model selection and parameters. Support bots and documentation assistants are somewhat less exposed, but Vercel says attackers can still manipulate them cheaply enough to make resale worthwhile.

According to the company, conventional web protections were designed for a different threat model. Attackers chasing high-value AI traffic can spread requests across large pools of residential proxy IPs and throwaway accounts, diluting the effectiveness of IP-based controls. In that setup, a check that only happens once per session can be bypassed once and then reused across large volumes of stolen calls.

A real attack on Vercel’s own service

Vercel says it saw the issue firsthand when traffic to its docs AI chat endpoint surged on April 12, 2026. The spike, which hit roughly 10 times normal traffic on Anthropic’s Claude Haiku 4.5 model, reached about 1,300 requests per minute at its peak. Vercel estimates that pattern would have produced an inference cost run rate of more than $10,000 per day.

The company says the traffic came through residential proxies, making the real source of the requests difficult to identify. Over two days, hundreds of thousands of bot requests moved past ordinary per-IP limits with little effect.

Per-request verification as the proposed fix

Vercel says the answer is to verify every AI request before it reaches the model, not just to protect a login session or signup flow. In its own setup, the company routes requests through BotID, a bot detection system with deep analysis that runs inside the route handler.

The company says this approach blocked more than 10,000 bot requests in the first minutes of the spike and returned the endpoint to normal traffic levels within 24 hours. Vercel also notes that visible image CAPTCHAs are increasingly ineffective because modern AI systems can often solve them.

BotID’s deep analysis is described as an invisible challenge powered by client-side machine learning that classifies requests without requiring user interaction. Vercel says that keeps the verification cost low enough to apply on every call, which is important because the protection has to be as inexpensive as the theft itself.

For developers, the company’s message is straightforward: if an AI endpoint can be reached from the internet, it needs request-level verification, not just account-level controls. Otherwise, Vercel says, attackers can turn a single bypass into a large-scale revenue drain.