Stanford GenAI Study Debacle – Thomson Reuters + LexisNexis Both Reply + HAI Comment

Following a request for comment on the Stanford University HAI group’s problematic study – (see here for what this is all about) ­– into the effectiveness of Thomson Reuters’ and also LexisNexis’ legal genAI systems, especially for case law research, both have now released statements to Artificial Lawyer, so too the HAI, (see below and following article).

Thomson Reuters Comment:

‘Thomson Reuters is aware of the recent paper published by Stanford.

We are committed to research and fostering relationships with industry partners that furthers the development of safe and trusted AI.

Thomson Reuters believes that any research which include its solutions should be completed using the product for its intended purpose, and in addition that any benchmarks and definitions are established in partnership with those working in the industry.

In this study, Stanford used Practical Law’s Ask Practical Law AI for primary law legal research, which is not its intended use, and would understandably not perform well in this environment.

Westlaw’s AI-Assisted Research is the right tool for this work. To help the team at Stanford develop the next phase of its research, we have now made this product available to them.’

In short, TR is clear that the HAI researchers at Stanford used the wrong data, i.e. ‘Stanford used Practical Law’s Ask Practical Law AI for primary law legal research, which is not its intended use, and would understandably not perform well in this environment’.

They added that they have given access to the Westlaw AI capability. So, hopefully, this can be addressed again with more useful results. And they also noted that such studies should be done ‘in partnership with those working in the industry’.

Stanford’s HAI group also sent this statement to Artificial Lawyer:

‘The Stanford study acknowledges Thomson Reuters also offers a product called “AI-Assisted Research” that appears to have access to additional primary source material as well (Thomson Reuters, 2023). However, the research notes this product is not yet generally available, and multiple requests for access were denied by the company at the time the researchers conducted the evaluation.’

There is a longer discussion of what this means here (May 25th), which includes more detail and context.

LexisNexis Comment

Jeff Pfeifer, Chief Product Officer for LexisNexis North America and UK, has responded on behalf of the company. In a statement to Artificial Lawyer he said:

‘LexisNexis has not been contacted by Stanford’s Daniel Ho, and our own data analysis suggests a much lower rate of hallucination.

LexisNexis has extensive programs and system measures in place to improve the accuracy of responses over time, including the validation of citing authority references to mitigate hallucination risks in our product.

Lexis+ AI delivers hallucination-free linked legal citations. The linked statement means that the reference can be reviewed by a user via a hyperlink. In the rare instance that a citation appears without a link, it is an indication that we cannot validate the citation against our trusted data set. This is clearly noted within the product for user awareness and customers can easily provide feedback to our development teams to support continuous product improvement.

LexisNexis focuses on AI answer quality through an enhanced LexisNexis proprietary Retrieval Augmented Generation 2.0 (RAG 2.0) platform. Lexis+ AI responses are grounded in an extensive repository of current, exclusive legal content which ensures the highest-quality answer with the most up-to-date validated citation references.

The solution is continually improving with hundreds of thousands of rated answer samples by LexisNexis legal subject matter experts used for model tuning. LexisNexis employs over 2,000 technologists, data scientists, and J.D. subject matter experts to develop, test, and validate its solutions and deliver comprehensive, authoritative information.

LexisNexis agrees with several of the generative AI legal challenges described by Professor Ho. As part of our customer feedback-driven development focus, continuous RAG development is designed to improve answer quality. The Lexis+ AI RAG 2.0 platform was released in late April and the service improvements address many of the issues noted. RAG technology is improving at an astonishing rate and users will see week over week improvements in the coming months.

We have leveraged other techniques to improve the quality of our responses. These include the incorporation of proprietary metadata, use of knowledge graph and citation network technology and our global entity and taxonomy services which aid to surface the most relevant authority to the user are used as key parts of our RAG infrastructure.

LexisNexis is also focused on improving intent recognition, which goes alongside identifying the most effective content and services to address the user’s prompt. Additionally, we’re educating users on ways to prompt the system more effectively to receive the best possible results without requiring deep prompting expertise and experience.

Our goal is to provide the highest quality answers to our customers, including links to validated, high-quality content sources.’

Additional comment from Greg Lambert

Note: Greg Lambert, who is Chief Knowledge Services Officer at US firm Jackson Walker, and co-founder of the 3 Geeks and a Law Blog, initially underlined the problem yesterday. He told Artificial Lawyer today:

‘There’s a saying that I and a lot of people use when talking about the current Gen AI tools and that is “this is the worst these products will ever be.”

One of the problems with this is that we are now smack dab in the middle of the Trough of Disillusionment on the Gartner Hype Cycle and buyers of these services are now expecting products to start meeting and exceeding their expectations.

These expectations were also set by vendors like Thomson Reuters and LexisNexis on having excellent search results for legal research with little to no hallucinations. This has been over-promised by the vendors and they’ve hung this promise on retrieval augmented generation (RAG) processes to ground the results in real data.

The Stanford report, with its shortcomings acknowledged, is still pulling on a thread of truth that Gen AI has a creativity problem in a world that relies on verifiable and citable facts.

RAG will help get the products part of the way there, but as this study clearly shows, the products can still bring back incorrect, but very convincing results, that can cause even good attorneys to fall for these hallucinations. It’s a feature, not a bug, of Gen AI.

No longer can Thomson Reuters and LexisNexis or other legal information vendors rely upon “it’s going to get better” as the excuse for pushing out a product that has a relatively high number of made-up results.

I think they’ll figure it out. But, it will take a multilayer system of checks and balances to find where GenAI is best used alongside other advancements like vector indexing and searching, semantic layers up front to define the scope of what the tool can do, and leveraging things like knowledge graphs to make the results much better, faster, and (maybe) cheaper for the researcher.’

Thanks to TR and LexisNexis for the comments, and also Greg.

P.S. as noted in AL’s social post today, and also during a discussion at the Legal Tech Speakeasy in London last night, our sector really needs to share objective information about genAI for lawyers on a community level. Sharing reliable info on genAI’s effectiveness lifts the whole sector up.