OpenAI says a reasoning model helped researchers revisit hundreds of difficult pediatric cases and uncover new diagnostic leads for rare genetic diseases.
In a study published June 18 in NEJM AI, researchers from Boston Children’s Hospital, Harvard University and OpenAI used the company’s o3 Deep Research model to reanalyze 376 previously unsolved cases. After human review, additional testing and clinical confirmation, physicians established 18 diagnoses, a 4.8% increase in diagnostic yield.
The findings add to a growing body of work suggesting that artificial intelligence can help specialists sift through complex genetic and clinical data, especially when earlier reviews have failed to produce an answer. The study did not test the model as a standalone diagnostic tool. Instead, it positioned the system as a way to surface evidence-linked hypotheses for clinicians to evaluate.
Rare-disease genetics often involves more than sequencing a genome. Physicians may need to connect clinical symptoms, family history, variant data and research literature that are spread across different systems and updated over time. Even after extensive testing, roughly half of patients with suspected rare genetic disease still do not receive a clear diagnosis.
The researchers argued that some earlier unsolved cases become interpretable later as new gene-disease links, case reports and variant classifications emerge. That makes periodic reanalysis important, but also labor-intensive for clinicians who may be tracking large backlogs of unresolved cases.
To address that problem, the team built a workflow that asked the model to produce an explanation rather than only a ranked gene. For each case, it reviewed de-identified data packets that included standardized phenotype terms, notes from clinicians and filtered variant tables with quality and rarity information. The model was asked to identify the most plausible molecular explanation and show how it reached that conclusion.
Human experts then evaluated the output using the same ACMG/AMP framework used in clinical genetics. At least two reviewers examined each candidate diagnosis, and no result was treated as a diagnosis until it was confirmed in a CLIA-certified laboratory and returned through the clinical team.
Before applying the workflow to unsolved cases, researchers tested it on cases with known answers. In those benchmarks, it repeatedly recovered the correct gene and variant in most cases, though not always. The authors said those runs helped refine the process while showing that expert review remained essential.
The study focused on four groups of previously unresolved cases: children with neurodevelopmental conditions, people with rare neuromuscular disease, children and adolescents with early psychosis, and cases of sudden unexpected death in pediatrics.
The largest gains came from the neurodevelopmental group, where 10 diagnoses were found among 100 cases. The model also surfaced four diagnoses among 61 neuromuscular cases, two among 15 early psychosis cases and two among 200 sudden unexpected death cases.
Some of the findings involved information that had already existed in public databases but had not been linked together in the local records reviewed by the team. In one psychosis case, the model inferred a likely 22q11.2 deletion, a structural change later confirmed by follow-up sequencing. In other cases, it suggested multi-gene explanations that better fit a complex set of symptoms.
The researchers also pointed to one case in which the model proposed a possible new biological link between an S1PR1 deletion and vitiligo. That hypothesis remains unconfirmed and would need further experimental work.
The study included a detailed example of a patient whose diagnosis emerged after nearly two decades without an answer. A girl named Kyra, whose muscle weakness began in childhood, was eventually linked to a variant in HSPB8 and diagnosed with myofibrillar myopathy.
OpenAI and the researchers stressed that the model did not diagnose patients or make clinical decisions. They said the work shows how AI may help experts narrow a search, especially when knowledge is changing quickly, but that any real-world use would still require medical oversight, confirmatory testing and careful attention to privacy and regulation.