Perplexity introduces hybrid local-cloud inference system at Computex

Perplexity says it can now split AI work between device and cloud

Perplexity has announced a new hybrid inference system designed to divide AI tasks between a user’s device and cloud servers, as the company pushes a vision of what it calls the next phase of personal computing. The company unveiled the approach at Computex and said the product, called Personal Computer, is set to arrive in July.

The system is built around a simple tradeoff: some AI tasks need the strongest models available, while others are better handled locally for privacy, speed or cost. Perplexity said its software is meant to decide automatically where each part of a task should run, rather than forcing users to choose between local and cloud processing ahead of time.

Aimed at sensitive and complex workloads

Perplexity said the hybrid model is intended for work that mixes sensitive information with more demanding AI processing. Examples cited by the company include financial records, health data and personal files. In those cases, a compact model running on the device can determine which information should stay local, while more demanding parts of the job can be sent to the cloud.

The company framed the approach as an orchestration problem. Instead of sending all work to a large server-side model, the system breaks tasks into pieces and routes them to the most suitable environment. Perplexity said this is meant to improve efficiency by avoiding unnecessary use of frontier models when smaller models can handle the job.

Perplexity also said the system is model-agnostic and can work across different local hardware. The company pointed to an Intel collaboration and said the same framework applies to other local chips, including NVIDIA's RTX Spark.

The pitch: better hardware keeps more work on-device

The announcement reflects a broader industry push to move more AI inference onto personal computers and other endpoint devices. Perplexity argued that as local hardware improves, more of the work can stay on a user’s machine, leaving cloud infrastructure for tasks that genuinely require it.

The company also said that keeping routine or sensitive work on-device could ease pressure on centralized compute infrastructure. By shifting some of the load to devices already in users' hands, Perplexity said the approach could reduce the need for as much server-side capacity.

Perplexity cast that shift as a question of both efficiency and control. It said that moving more processing locally could help keep important data within a user’s own jurisdiction without requiring the construction of dedicated data centers for that purpose.

Perplexity positions itself around efficiency

The company said its incentive structure makes it well suited to this kind of system because its business depends on accurate answers rather than on maximizing compute use. In Perplexity’s view, that makes it more natural to optimize for value per watt and to use larger models only when necessary.

Hybrid AI has long been discussed across the industry, but Perplexity said its Personal Computer product is the first to make that model practical in a seamless way. The company said the system is designed to treat compute itself as something that can be orchestrated intelligently across the machine and the cloud.

Perplexity’s announcement signals a continuing effort to expand its platform beyond answer generation into broader AI coordination. By focusing on task routing, local inference and cloud orchestration, the company is betting that future AI systems will be judged not only by model quality, but also by how efficiently they decide where work should happen.