OpenAI unveils o3 and o4-mini with expanded reasoning and tool use

OpenAI on April 16 introduced two new reasoning models, o3 and o4-mini, describing them as its most capable systems yet for ChatGPT users and developers. The company said both models are designed to think longer before answering and to use tools more intelligently across tasks involving text, images, code, and web data.

The release marks a broader push toward what OpenAI calls a more agentic ChatGPT, meaning a system that can carry out multi-step work more independently. For the first time, OpenAI said its reasoning models can combine every tool available in ChatGPT, including web search, Python-based analysis of uploaded files, image generation, and visual interpretation. The company said the models are trained not just to use tools, but to decide when they are useful.

OpenAI said o3 is its strongest reasoning model so far, with particular gains in coding, math, science, and visual perception. The company said it reached new benchmark highs on tests including Codeforces, SWE-bench, and MMMU. According to OpenAI, external evaluators found that o3 made fewer major errors than its earlier o1 model on difficult real-world tasks. The company said those evaluators saw the model as especially strong in programming, consulting-style analysis, and creative ideation.

The smaller o4-mini model is aimed at faster and lower-cost reasoning. OpenAI said it performs especially well in math, coding, and visual tasks, while supporting higher usage limits than o3. The company said it posted strong benchmark results on AIME 2024 and 2025, and that it performs better than its predecessor, o3-mini, on several non-STEM and data science tasks.

A key theme in the announcement was multimodal reasoning. OpenAI said both models can incorporate images directly into their internal reasoning process, rather than merely describing them at the output stage. That capability, the company said, should allow users to upload photos of whiteboards, diagrams, or sketches, even if the image quality is poor or the orientation is unusual. With tool use, the models can also manipulate images by rotating or zooming them as part of the analysis.

OpenAI also emphasized tool use as a core part of the models’ design. In one example, the company described a user asking about future summer energy usage in California. The model could search for current data, run Python code to build a forecast, create a chart, and explain the drivers behind the estimate in a single workflow. OpenAI said the models can also search multiple times and adjust their approach as new information appears.

The company said the models were trained with large-scale reinforcement learning, and that greater compute and more time to reason continued to improve performance. It said that at the same latency and cost as o1, o3 performs better in ChatGPT, and that giving it more time to think improves results further.

OpenAI also said the two models produce more natural, conversational responses than earlier reasoning systems, and that they are better at following instructions and providing answers that are both useful and verifiable. The company noted that both models are available through ChatGPT, with API support for custom tools via function calling.

In a later update, OpenAI said o3-pro became available to Pro users in ChatGPT and in the API on June 10. The company described that version as a longer-thinking variant of o3 designed to deliver more reliable responses.