OpenAI says AI chemist improved a medicinal chemistry reaction

OpenAI says a near-autonomous AI chemistry system helped improve a difficult reaction used in drug discovery, identifying an additive that raised yields across most tested substrates.

The company described the work on June 17 as a collaboration with Molecule.one, a chemistry automation firm that runs experiments in a high-throughput laboratory. In the project, OpenAI connected GPT-5.4 to Molecule.one’s Maria system and gave it a broad goal: find ways to improve a reaction class used in medicinal chemistry.

The target was Chan-Lam coupling, a method chemists use to build carbon-nitrogen bonds. Those bonds are common in many drug-like molecules, but the reaction has long been unreliable for some substrate classes. OpenAI said GPT-5.4 independently focused on primary sulfonamides as a challenging and valuable target and suggested that mild oxidants, including TEMPO, might help.

That idea turned into a two-round experimental campaign in Maria Lab. According to OpenAI, the optimized conditions improved yields for 88% of the boronic acids tested and 83% of the sulfonamides tested. Average yield increased from 16.6% to 25.2%, while the share of reactions above 30% yield rose from 15.6% to 37.5%.

Human chemists then repeated a selection of the reactions at bench scale. OpenAI said those manual tests confirmed the micro-scale results, with improved yields in 11 of 14 substrate pairs and more than a twofold increase in most of those cases. The company said that matters because chemistry that works in tiny screening experiments does not always translate to practical lab workflows.

The project ran 10,080 reactions in total, a scale OpenAI said was necessary because individual chemistry experiments can be misleading if they are tested on only a few examples. The system first generated and ranked thousands of proposals, and scientists selected a small set for lab testing. Maria then converted the chosen ideas into detailed instructions, executed the experiments, analyzed the results, and fed them back to GPT-5.4 for follow-up planning.

OpenAI said the strongest proposal was labeled OAI-M1-03. A follow-up finding suggested that TEMPO could be swapped for 4-hydroxy-TEMPO, a cheaper related compound, with little drop in performance.

The company emphasized that the work was not fully autonomous. Human chemists remained involved throughout the process, choosing which ideas advanced to the lab, making some corrections to experimental plans, supporting lab operations, and validating the final results. OpenAI also said the project was intentionally limited to a legitimate medicinal chemistry problem and did not test harmful chemical applications.

The results are still early, and OpenAI said further work is needed to determine how broadly the improvement applies, why the additive helps, and whether independent labs can reproduce the findings. Bench validation covered 14 representative substrate pairs.

The company framed the project as an example of how AI systems could become more useful partners in scientific research, not by reasoning alone but by taking part in the full loop of proposing ideas, designing experiments, interpreting data, and suggesting next steps. For medicinal chemistry, where the ability to make molecules often determines what can be tested, even modest gains in a widely used reaction could be significant.

OpenAI said the broader goal is to build AI tools that help researchers move faster while still operating under human oversight and safety safeguards.