Krea details Krea 2 image models in new technical report

Krea opens up Krea 2’s design choices

Krea has published a technical report describing Krea 2, a family of image generation foundation models built around creative exploration and user control. The company says the system is intended to offer broad aesthetic range without sacrificing performance on standard text-to-image tasks.

The report outlines the model’s data strategy, architecture, training pipeline and supporting infrastructure. Krea also released model weights, inference access and code under a permissive license, signaling an unusually open launch for a commercial image model.

The company frames Krea 2 as a response to what it sees as a growing problem in image generation: many models have become highly capable, but increasingly biased toward a narrow set of polished, default-looking outputs. Krea argues that artists and designers often need tools that help them explore styles, moods and compositions rather than simply produce one optimized result.

Data and training emphasize diversity

A major part of the report focuses on how Krea built its training data. Rather than relying on conventional quality filters alone, the company says it aimed for broad coverage across subjects and styles. It argues that some filtering systems can overvalue sharpness or photorealism and miss deliberate artistic effects such as blur or softness.

Krea says it removed duplicates, overrepresented concepts, samples that caused unwanted biases or artifacts, content too complex for low-resolution modeling, and AI-generated images. The company says synthetic images were excluded entirely from pretraining because even a small amount can skew model behavior.

To create captions, Krea used a multi-step process that started with OCR and then combined extracted text with metadata and world knowledge. Those richer captions were later reformatted into different prompt lengths so the model could train on both detailed and shorter user inputs.

The pretraining process itself was staged across 256, 512 and 1024 pixel resolutions. Krea describes this as a curriculum that first teaches basic alignment and structure at lower resolutions before moving toward higher fidelity generation.

Architecture and control features

Krea says Krea 2 is based on a diffusion transformer architecture refined through ablation testing. The company says the system includes a number of components meant to speed training and improve stability, including iREPA, improved variational autoencoders and Qwen3-VL. It also highlights architectural choices such as grouped-query attention, sigmoid-gated attention and lightweight timestep modulation.

One of the report’s central themes is steerability. Krea says it built a prompt expander that turns underspecified prompts into richer visual directions while preserving the user’s intent. It also created a style-reference system that lets users guide generation with one or more images, including controls for style strength and mixing.

These systems are meant to close the gap between how the model is trained and how people actually describe creative ideas. Krea says users often express intent in fragments, moods or reference images, rather than in the long, detailed captions that help the model learn.

Midtraining, curation and benchmark claims

Krea also describes a midtraining stage that uses carefully selected image sources to bridge the gap between broad pretraining and more targeted supervised fine-tuning. The company says it used semantic clustering, retrieval and human review to maintain coverage of both common and rare concepts.

The report says Krea 2 is among the top 10 models on the Artificial Analysis text-to-image leaderboard and ranks second among models from independent labs. Krea presents that result as evidence that its focus on exploration does not come at the expense of competitive quality.

The publication adds another detailed entry to the increasingly technical arms race among image model developers. But Krea’s report stands out for arguing that better image generation is not only about realism or prompt adherence. It is also about helping users discover a wider visual space.