OpenAI says Codex helped a tax agent improve itself during tax season

OpenAI says tax-prep agent improved itself through practitioner feedback

OpenAI says a tax preparation agent built with Thrive Holdings got better during tax season, using feedback from practitioners to improve its performance on increasingly complex work. The company says the project offers a blueprint for how AI systems can learn from mistakes in enterprise settings.

The system, called Tax AI, was developed by OpenAI forward-deployed engineers working alongside Thrive Holdings engineers. It helps automate parts of preparing 1040 and 1041 tax returns. OpenAI said the tool initially handled simpler tasks such as ingesting W-2 and 1099 forms, but later expanded into more difficult work as the season progressed.

According to OpenAI, the agent reduced tax document preparation time by about one-third while maintaining accuracy as high as 97%. The company also said the system improved its field completion rate over time, which it uses as a measure of whether tax return forms are filled out correctly.

At launch, just 25% of returns reached 75% or higher field completion. Within six weeks, that figure rose to 86%. OpenAI said the system eventually reached 100% correct field completion on 90% of returns.

The company attributes those gains to a three-step improvement loop. First, practitioners identify errors and guide what the system should learn. Second, the system tracks the process in more detail so corrections can be converted into evaluations. Third, a Codex-based improvement loop turns those evaluations into engineering tasks that help refine the overall workflow.

OpenAI gave an example involving rental property income and Schedule E forms. In that workflow, Tax AI extracts information from messy source materials, including handwritten notes, emails and spreadsheets, then maps the data into a tax engine for practitioner review. When errors are found, the corrections are captured as structured data, grouped into recurring patterns and passed to Codex as scoped tasks.

OpenAI emphasized that the part becoming self-improving is the harness around the model, not the model itself. The company said the Tax AI system runs on OpenAI’s Codex harness, which is being iteratively updated based on practitioner input. Because the harness is open source, OpenAI said other developers can use it to build similar systems.

John de Wasseige, one of the FDE leads on the project, said the approach is broadly applicable beyond tax preparation. OpenAI said it is already working with Thrive Holdings to adapt the same workflow to accounting tasks such as booking, audit and operational processes.

The effort comes as businesses continue to press AI vendors on accuracy, especially in high-stakes enterprise workflows. OpenAI framed the project as a way to make AI behave more like a skilled coworker that remembers corrections and avoids repeating the same mistakes.

The company said it published the account to give developers a practical model for building similar systems and to encourage more work on AI that can improve through feedback rather than static deployment.