Stanford study says AI tutors beat law faculty in contract-law office-hours test

A Stanford University study has found that AI tutoring systems can outperform law faculty in a blind test built around contract-law office hours, a result that could influence how legal education uses generative AI.

The research, described in a Stanford law school study, compared answers from AI tutors with those from law professors and instructors in a setting designed to resemble student office hours. In the blind evaluation, participants reviewed responses without knowing whether they came from a human faculty member or an AI system. The AI-generated answers were rated more highly overall, according to the study.

The finding is notable because office hours are one of the most common ways law students seek help with dense reading assignments, doctrinal questions and exam preparation. Contract law, in particular, often requires students to navigate technical language, subtle distinctions and layered hypotheticals. The study suggests that AI tools may be able to deliver clear, responsive explanations in that setting, at least when judged on the quality of written answers.

The result does not mean AI is ready to replace law professors. The study focused on a narrow academic task and evaluated response quality in a controlled comparison, not the full range of teaching, mentoring and judgment that faculty provide. The source material also does not indicate that the researchers concluded AI should serve as a substitute for human instruction. Instead, the findings point to the growing capability of AI systems in specialized knowledge domains.

The study arrives as law schools continue to debate how much generative AI should be used in classrooms and student support services. Some institutions have moved cautiously, worrying about accuracy, transparency and overreliance on automated tools. Others are experimenting with AI as a supplement for practice questions, feedback and explanation of complex concepts. A strong showing in a blind office-hours test could accelerate that conversation.

For legal educators, the implications may be practical as well as philosophical. If AI tools can reliably produce useful responses to student questions about contract doctrine, schools may consider deploying them to handle routine inquiries, leaving faculty with more time for deeper discussion and individualized mentoring. At the same time, any adoption would likely require safeguards around correctness, bias, and the limits of AI reasoning.

The Stanford study adds to a broader body of research showing that generative AI is increasingly competitive with human experts in structured tasks that rely on synthesis and explanation. In education, that may be especially relevant for high-volume support roles where students need immediate answers. But the study also underscores the difference between generating a polished response and teaching a subject in a sustained, interactive way.

As law schools weigh that tradeoff, the Stanford findings are likely to serve as another sign that AI is moving from a novelty to a serious tool in professional education. The challenge now is determining where it can help students most, and where human expertise remains essential.