
Legal tech startup Paxton AI, which focuses on contract drafting, review and research, has launched ‘Citator’, a now patent-pending feature that checks the standing and precedential value of case law.
Citation tools track whether a case has been overruled, affirmed, questioned, or cited by later cases, they added. ‘Traditionally, citators like LexisNexis’s Shepard’s, Thomson Reuters’s KeyCite, and Bloomberg’s BCite have been managed by human editors and reliant on extensive legal databases. Human-based citators have been essential for legal research, yet they can be expensive, time-consuming, and prone to error’, they explained.
So, ‘leveraging artificial intelligence, the Paxton AI Citator addresses these limitations. Evaluated against the Stanford Casehold dataset of 2,400 examples testing whether a case was overturned or upheld, our citator achieved a 94% accuracy rate (Stanford Casehold Benchmark),’ they claimed.
With their new feature Paxton reviews the case itself, every case that cites to the case in question, and cases that are semantically similar to the case in question. Once the Paxton AI Citator’s analysis is complete, users can navigate to the Paxton Citator tab to see an in depth analysis of all of the Important Cases, where Paxton highlights significant case relationships.
Paxton explained that for example, if a user runs the citator on Roe v. Wade, Paxton will return Dobbs v. Women’s Health Organization as an important case. Paxton recognizes that Dobbs overturned Roe, stating that the relationship ‘can only be accurately categorized as ‘overturned.’’
The table below presents some performance metrics for the Paxton AI Citator, which they have provided. The metrics they use are:
- ‘Precision measures the accuracy of positive predictions. For example, a precision of 0.90 for overruling cases means that when the AI predicts a case was overruled, it’s correct 90% of the time.
- Recall measures the ability to find all positive instances. A recall of 0.99 for overruling cases means the AI correctly identified 99% of all cases that were in fact overruled.
- F1-Score is the harmonic mean of precision and recall, providing a single score that balances both metrics. An F1-score closer to 1 indicates better overall performance.’
Paxton AI Citator Performance on the Stanford Casehold Benchmark

In conclusion, the company said: ‘Our Citator standardizes case analysis based on sophisticated AI reasoning, ensuring each case is evaluated under consistent criteria. This method significantly reduces the variability and subjective interpretations common in human review, offering a more reliable and predictable research tool.’
So, there you go. Is it as good as they say it is? Well, they’ve given some accuracy scores and been open about how they did it – see their blog post here for a much more detailed explanation of how they did their testing. So that’s a positive step. Let’s see what the market thinks of it.
Just recently, Screens also published accuracy scores in Artificial Lawyer and provided detailed evidence of how they achieved their scores – see here. Looks like this is becoming a trend. This is all very useful…if the companies are providing clear evidence and explanations of how they have achieved their results.