Elorian has raised $55 million to build AI systems focused on visual reasoning, a field the startup says is still underserved by today’s largely text-driven models.

The company argues that artificial intelligence needs to move beyond converting images into language and then reasoning from text. Instead, it wants models that can work directly with visual information and understand spatial relationships, physical constraints, design intent and other forms of structure that are often hard to describe in words.

A push beyond text-based reasoning

In its fundraising announcement, Elorian said current vision-language systems typically handle images in a two-stage process. They first translate what they see into language, then apply text-based reasoning. The company says that approach can be brittle and may fail on tasks where a visual judgment is easier than a verbal explanation. It points to examples such as evaluating athletic motion, interpreting sketches, reading scientific imagery or understanding layouts and objects in the physical world.

Elorian believes that models trained natively for visual reasoning could handle these problems more reliably. Rather than treating images as static inputs, the company says its systems are designed to interact with and manipulate visual representations while learning how structure and constraints shape the problem at hand.

The startup also drew a distinction between generative image and video models, which have advanced rapidly, and reasoning systems that can truly analyze visual content. According to Elorian, progress in generation has not closed the gap in tasks that require deep understanding of relationships inside an image.

Plans for multimodal training and new architectures

Elorian said its approach combines multimodal training with new architectures built for multimodal reasoning. The goal is to move models from basic perception toward more advanced forms of visual problem-solving over time.

The company says those capabilities could have broad industrial uses. In engineering and design, it envisions AI tools that help refine products and improve efficiency. In robotics, it sees visual reasoning as a foundation for systems that can operate in new environments. It also points to medicine, scientific research, satellite analysis, weather monitoring, disaster response and precision agriculture as areas that could benefit from better visual understanding.

Elorian said it is aware of the risks that come with building systems this powerful and said it plans to include safeguards in its work.

Founder background

The company is led by co-founder and chief executive Andrew Dai, who previously worked at Google Brain and DeepMind. The startup highlighted his earlier work on major language model techniques, including pretraining and supervised fine-tuning, as well as work on mixture-of-experts systems and model development efforts tied to Gemini and PaLM 2.

The announcement did not specify investors or a valuation, but the new capital gives Elorian a sizable runway to pursue a technical agenda that is distinct from the dominant text-first approach in AI.

Elorian is entering a crowded field where large AI labs are racing to improve multimodal models. Its bet is that true visual reasoning will require new methods, not just bigger models or better prompts. Whether that thesis proves correct will depend on whether it can deliver systems that perform better on real-world visual tasks than today’s general-purpose AI tools.