AI Can Grade Law School Exams About As Well As Professors

Can legal education stay the same in the AI era? That is the key question a pioneering new AI performance study raises. The study found that AI models can grade law school exams with near to the same accuracy as human professors. The team’s main finding was ‘when supplied with grading rubrics, [AI] produced scores that closely tracked human graders with correlations up to 0.93’.

The study concluded: ‘Our analysis suggests that LLMs have the capacity to roughly approximate the grading of a law professor, with the best results coming from prompts that incorporate detailed rubrics.’

Daniel Schwarcz, Professor at University of Minnesota, and part of the group of academics who conducted the study, told Artificial Lawyer: ‘The implications for legal education and lawyers are significant, I think.

‘Even if AI never replaces human grading, these tools already appear capable of offering meaningful feedback on practice exams, midterms, draft memos, and other legal writing exercises that students often struggle to get reviewed. For young associates, the same techniques point toward new ways to refine written advocacy before it ever reaches a partner’s desk.’

Artificial Lawyer then asked: if we accept that most students are already using AI to help them learn; that much of the legal education corpus can be discovered digitally; and that AI can grade papers and exams….then do we need law professors, or at least in the way we do now?

In fact, one could even go to the ultimate question: do we need law schools, at least brick and mortar ones that charge enormous fees for an unpredictable future for their students? Doesn’t this study point to a new kind of law school experience?

Schwarcz told Artificial Lawyer: ‘My answer is yes, we do need law professors, because it’s still the case that many students learn best from in-person instruction from humans.  I also think we’re not yet at the point where we can totally outsource grading of law school exams to AI; we don’t yet know whether AIs are as reliable as humans in grading exams, and even in our experiment we had  human law professors write the exams and the grading rubrics. 

‘So, at least for now, there is still lots for law profs to do.  But AI can absolutely improve law school education by providing lots more opportunities for students to track the quality of their work and get customized tips for improving.’

As to how they did their experiments, it’s quite a complex format, but in short they designed methodologies – i.e. rubrics – for the AI to follow in order to mark the papers. They then went back and checked to see how they did compared to what human law professors would do.

And here is the intro and conclusion to the paper:

‘This paper presents results of an analysis of how LLMs perform in evaluating student responses to legal analysis questions of the kind typically administered in law school exams. The underlying data come from exams in four subjects administered at top-30 U.S. law schools. Unlike some projects in computer or data science, our goal is not to design a new LLM that minimizes error or that maximizes agreement with human graders. Rather, we seek to determine whether existing models—which can be straightforwardly applied by most professors and students—are already suitable for the task of law exam evaluation. We find that, when provided with a detailed rubric, the LLM grades correlate with the human grader at Pearson correlation coefficients of up to 0.93. Our findings suggest that, even if they do not fully replace humans in the near future, LLMs could soon be put to valuable tasks by law school professors, such as review- ing and validating professor grading, providing substantive feedback on ungraded midterms, and providing students feedback on self-administered practice exams.’

And also the conclusion:

Our analysis suggests that LLMs have the capacity to roughly approximate the grading of a law professor, with the best results coming from prompts that incorporate detailed rubrics. With LLMs’ rapid improvement in text analysis tasks over just the last few years, we can probably expect continued improvement in the near future. Future analysis—including on different exams on different subjects—is likely to produce even greater agreement between humans and machines. Routine updates to this study will be necessary as the LLMs evolve.

Even as LLMs increasingly approximate human graders’ performance, we acknowledge that logistical, institutional, and political challenges exist to replacing law professor grading. For instance, some law schools have longstanding rules requiring that professors (rather than, say, teaching assistants) personally grade exams and assign grades. It is unclear if machine-graded exams would comply with such rules. But even if they would or if the rules are changed, other political forces and path dependence may prevent law schools from replacing human graders with LLMs for the foreseeable future.

Even so, our findings suggest that automated legal analysis scoring could be effectively used for other valuable grading tasks in the very near future. For instance, professors could use LLMs as a supplement to their own grading, review- ing their own evaluations to detect any errors and bias. In addition, students might use LLMs and instructor-supplied rubrics to receive feedback on their practice exams, thereby potentially meeting otherwise burdensome ABA requirements to provide formative assessments in all first-year law school classes.’

Final thoughts from AL

As noted, legal education is entering a new world where large parts of the experience can be assisted with AI. Some will then suggest that if that is so, then do we need law schools as they currently exist?

Do law schools have to change with the times, especially given their huge fees, and the reality that both teaching and materials – and now grading – can operate in a much more digital way, aided by AI? I.e. this is not about law schools teaching students about using AI tools, but rather disrupting their own business-educational model.

Many thanks to Professor Schwarcz and colleagues for the fascinating study.

P.S. clearly, the AI performance will likely only get better in the future….


Discover more from Artificial Lawyer

Subscribe to get the latest posts sent to your email.