Here we go again. An academic blasts a legal AI tool for not working as expected and the legal tech company moves to defend itself. In turn leaving us, the buyers in the middle, confused. So, what’s happened this time?
Basically, Professor Benjamin Perrin of the University of British Columbia, published on November 12 in the Canadian Bar Association’s National Magazine an article explaining his experience of using the local version of Lexis+ AI for a handful of research tasks – and how he was not happy at all with the results. He concluded that he could not recommend that his students use the system, given its challenges, at least based on what he saw.
You can see the several claims here.
The claims centre around a number of attempts to get prompts to produce legal research-related results, such as getting the LLM-based system to draft a motion in relation to the Supreme Court, and provide summaries. E.g. as Perrin explained: ‘I began by asking Lexis+ AI to draft a motion for leave to intervene in a constitutional challenge to a drug possession offence.’
But, LexisNexis explained to this site that Lexis+ AI has four main ‘drafting use cases: arguments, memos, letter/email, and clauses’ – but not for the drafting of motions.
He then tried other prompts. For example: ‘I asked Lexis+ AI to summarize the Supreme Court of Canada’s Reference re Senate Reform. Instead of generating an original summary, it simply copied verbatim the headnote from the case.’ That was then followed up by further prompts to develop this line of questions.
But, Lexis says that their system is not yet designed to handle multi-turn conversations in relation to summaries in Canada. It’s also unclear how the system was prompted in these cases and what would have led to headnotes arriving as the response.
And also, ‘I posed some legal questions to Lexis+ AI in areas of law that I teach and know reasonably well, such as “What is the test for causation in criminal law?” The responses were concise, confident and linked to actual cases, but the content was riddled with mistakes.’ Now that one does indeed seem to land.
Late on the 18th Nov, (i.e. last night), Jeff Pfeifer, Chief Product Officer, LexisNexis North America & UK put out a comment that was sent to Artificial Lawyer.
They point out that: ‘LexisNexis has not been contacted by Professor Perrin, but we welcome the opportunity to explore his suggestions to improve the product experience.’
Note: however, Prof Perrin, via social media, just told this site the following: ‘I provided concerns to LexisNexis during a faculty training session on September 10, 2024.’
The response also sets out how Lexis+ AI engaged heavily with Canadian lawyers, and clients so far seem happy. It’s hard to gauge that from London, but over here Artificial Lawyer recently interviewed two legal innovation heads from major UK firms using the system and both seemed satisfied with what they were getting out of the tool. But, what we have here are responses, both positive and negative, from a small survey group. If we had responses from 100 customers of Lexis+ AI, that would be more meaningful in both directions.
That said, it looks like Perrin has indeed found some error-prone areas – beyond the failing to draft motions – and those clearly need to be addressed.
The problem here is that this was not a scientific test, it was several prompts that didn’t get to where the prof wanted. Yet, his criticisms could potentially be from any user, so this shows at the least that there is still a gap between how people use the system and how LexisNexis, and other legal tech suppliers of legal AI tools, expect, or perhaps hope, they will use them.
Looking in from the outside it’s like listening into a trial where you’ve heard a snippet from the prosecution’s main witness, and some comments from the defence’s main witness, but you can’t really see the whole picture. And that makes judgement here really hard to do.
This underlines the need for clear benchmarks in terms of AI accuracy, i.e. if the prof had known what was possible with that tool and what to expect from the prompts he used, would this have turned out the same way? Then, whose responsibility is it to make sure users know how to get the best from a platform? That has to be with the seller of the software.
But, overall, while a bit reminiscent of the ‘Stanford Debacle’ from over the summer, this doesn’t seem to be quite on the same level. That said, it’s still embarrassing for LexisNexis and no matter what misconceptions Perrin may have had in relation to motions, some of his other criticisms do seem valid.
So, where does this leave us? Well, as said in the past, these systems are constantly evolving. RAG is getting better. The system prompting on the back end, which the users never see, is also getting more refined.
But, a work in progress, with users that may not fully get what can and cannot be done, will create mistrust and these kinds of stories. It’s a symptom of putting out products where the users’ expectations are so high, that when things don’t happen as expected there is then a strong outpouring of disillusionment.
The answer, as ever, seems to be around better communication and development of shared standards. E.g. More explanation of what the tools can do, what their limits are, where to expect things to go amiss, and how to make your own judgements around accuracy.
If you just throw a powerful (but sometimes imperfect) tool at an inquisitive and smart user, then you’ll get errors, and those errors will trigger anger and disbelief on the part of the user. In short, it looks like while tech improvement is part of the answer, better communication of the aspects noted above are just as important.
Any road, here are some of the other key parts of Jeff Pfeifer’s comment, in particular around how Lexis+ AI addresses accuracy:
‘First, Lexis+ AI provides linked validation of citing authority references to underlying cases and legislation to help substantiate outputs and mitigate hallucination risks. The product recommends further review if a citation is not hyperlinked.
Second, in the fast-paced large language model (LLM) evolution, Retrieval Augmented Generation (RAG) and fine-tuning are the best methods available to ensure the highest answer quality. LexisNexis focuses on AI answer quality through an enhanced proprietar
RAG platform, which now includes AgenticRAG capabilities for complex and nuanced legal queries. Our proprietary RAG infrastructure allows us to ensure that Lexis+ AI responses are grounded in our extensive repository of current, exclusive legal content, ultimately achieving the highest-quality answer with the most up-to-date validated citation references. We are seeing week-over-week improvements in answer and citation quality, thanks to ongoing technology development.
Third, LexisNexis has incorporated other techniques to improve the quality of our responses—proprietary metadata, the use of knowledge graph and citation network technology, and taxonomy services, to name a few. Each aids in the identification of the most relevant legal authority to support user questions.
Importantly, LexisNexis is responsibly developing legal AI solutions with human oversight. LexisNexis, part of RELX, follows the RELX Responsible AI Principles, considering the real-world impact of our solutions on people and taking action to prevent the creation or reinforcement of unfair bias. It is important that humans have ownership and accountability over the development, use, and outcomes of AI systems. We have an appropriate level of human oversight throughout the lifecycle of our solutions. This is core to ensuring the quality and appropriate performance of our solutions.
Professor Perrin suggested that users prefer AI-summarized cases to human-summarized cases. We appreciate his suggestion and will consider this in future product development.’