
An analysis of the VALs genAI benchmarking study by Noah Waisberg, CEO, Zuva.
Artificial intelligence tools for contract review have made real progress in recent years. Yet, recent findings from the Vals Legal AI Report vendor evaluation re-raise a critical question: how reliably accurate is Generative AI at contract data extraction?
In the Vals study, three of the four Generative AI tools reviewed (Harvey, Vincent AI (from vLex), and Oliver (from Vecflow)) missed a pretty standard Most Favored Nation (MFN) clause. (The human lawyer and Thomson Reuter’s CoCounsel both found it.) As someone who has been working at AI contract review since 2011, I found this result surprising – and troubling. Here’s why this matters and what it teaches us about relying on AI for contract review.
MFN clauses are contractual provisions that ensure one party receives terms no less favorable than any others. If a company offers a better deal to someone else, it must extend those same benefits to the MFN-holder. While pretty uncommon in most agreement types, MFNs can carry significant risks—particularly in M&A situations. Missing one could lead to costly post-transaction surprises.
In my years as an M&A lawyer, missing an MFN clause was not an option. These clauses can affect pricing, terms of service, and can apply post acquisition. For instance, an MFN clause in a master services agreement could bind a future acquirer—e.g., a big company like IBM or Accenture—to legacy contract terms, creating substantial risks.
Despite their importance, MFN clauses are tricky. They may not appear in every contract, and when they do, they’re often phrased in subtle ways, making them harder for AI systems to detect than simpler clauses like governing law. Problematically, some other super-important can’t miss M&A-relevant clauses (like change of control, exclusivity, and non-compete) are also hard for AIs to reliably identify, in my experience.
The Vals AI evaluation examined how four Generative AI tools performed in a number of different tasks. One of the tasks was data extraction. For data extraction,
The dataset had 30 questions, which were asked over a total of 29 documents. Each question was asked with reference to 1-7 documents provided as context.
I think this means that they asked 30 specific contract questions across 1–7 documents. In general, at Zuva, we would tend to test across hundreds of documents (to give a more reliable view of accuracy), but this wasn’t our study!

While the tools apparently did well on many tasks, the report shows that three out of four missed the clause. This specific clause read:
“GlobalLogic shall provide Client access to its personnel no less favorable than what it provides any other customer.”
This is a pretty standard MFN clause.
—
And here is a video guide by Noah, explaining much of what is in this article.
If these leading Generative AI systems can miss something as vital as an MFN clause in a controlled testing environment, it raises questions about what they might miss in real-life due diligence? Actual due diligence contract reviews often deal with hundreds of documents, some poorly scanned or inconsistently formatted, making the AI’s task harder—not easier. My information is that the vendors studied administered their own tools in the Vals study – likely meaning the vendor had some [of its most appropriate and best] team members inputting the prompt in the way they thought would work best in their system, which is also not necessarily how real world diligence would go down.
Unlike a non-material typo, missing an MFN clause could materially affect deal value and risk exposure.
What Zuva Found — and Why Consistency Matters
After reviewing the Vals report, I was curious: Could Zuva find the MFN clause where others failed? So I put Zuva’s contract analysis tool to the test on the same document, in various formats:
- Full-length MSA (79 pages)
- Two shortened versions
- PDF and Word formats
In each case, Zuva found the MFN clause correctly and consistently, regardless of document length or format.
Testing GPT-4: An Inconsistent Experience
Since GPT-4 seems to serve as the backbone of some of the Generative AI tools tested (and since I don’t have access to CoCounsel, Harvey, Vincent, or Oliver), I ran the same contract through the paid version of OpenAI’s GPT-4o multiple times. The results were inconsistent.
In my first bunch of tries, the 4o failed to take PDF, shortened PDF, and Word versions of the document. A typical reply was something like:
It seems there is still an error when trying to process the PDF file directly. To ensure I can help you extract the “most favored nation” (MFN) clauses, here are two options:
You can upload the file again, and I will attempt another extraction.
Alternatively, if you prefer, you can copy and paste the text of the contract or the relevant sections here, and I can analyze it for any MFN-related language.
I then gave 4o a link to the contract on EDGAR (which likely was part of the 4o training data, for what it’s worth), and got the following reply:
The Master Services Agreement available does not appear to contain any direct references to “most favored nation” (MFN) clauses or language explicitly describing MFN provisions. The document includes general contractual terms such as definitions, confidentiality, deliverables, intellectual property, and terms related to services, but there are no explicit sections or clauses identified as “most favored nation.”
If you’d like, I can help analyze other specific sections or search for other clauses. Let me know!
On LinkedIn, Arthur Souza Rodrigues suggested that the Harvey/Vincent/Oliver miss was due to formatting. To test this, I put just the terms and conditions from the document into Word and cleaned it of all extraneous footers (making the task easier for the LLMs). I ran into “persistent system issues preventing direct processing of the file” so copied all the text from the shortened version into a prompt and got:
Thank you for providing the full text of the contract. After reviewing this content, **there are no references to “most favored nation” (MFN) clauses or language typically associated with MFN provisions** in this document.
MFN clauses often involve language indicating that one party will receive terms no less favorable than those offered to others (e.g., pricing, access, rights). This kind of provision is **not mentioned** in the sections you provided.
Interestingly, when I tried the same prompts later today, I got responses like the below, which included the relevant MFN text (among a bunch of other stuff).
After reviewing the GlobalLogic Master Services Agreement (MSA) you provided, there are no direct or explicit references to “most favored nation” (MFN) clauses within the visible portions of the document. Typically, a “most favored nation” clause would state that one party (here, likely the client) would be entitled to terms no less favorable than those provided to any other customer, often regarding pricing, service levels, or other contractual terms.
However, there are a few clauses and references that deal with pricing, fees, adjustments, and client rights that may functionally relate to or overlap with some of the concerns addressed by MFN clauses, though they are not labeled as such. For example:
1. Annual Review and Pricing Adjustments (Section 3.9.1 and Amendments)
- Section 3.9.1 (Annual Review of Relationship) allows both parties to review and potentially adjust fees and rates on an annual basis, subject to caps and mutual agreement.
- In the First Amendment, this was expanded and clarified, with references to labor rates adjustments based on market surveys, with a capped upward adjustment.
Excerpt (First Amendment to Section 3.9.1):
“GlobalLogic will work with Client to create a budget for salary increases or decreases for individuals. Market survey data will be used to create the salary and benefit adjustment budget. The Parties shall collaborate to manage any annual adjustment upward to the Adjustable Costs…”
Relation to MFN: While not explicitly an MFN, this clause ensures that fee adjustments are linked to market data, which can have an effect similar to an MFN if properly monitored.
2. Client Rights to Review Fees (Section 3.1 and Exhibit M)
- Section 3.1 allows Client to review the fees in accordance with Exhibit M (Benchmarking and Annual Review Procedures).
- Exhibit M describes procedures for benchmarking, which typically involves comparing the terms of the agreement to market standards or other agreements.
Excerpt from Section 3.1:
“Client is entitled to review the Fees in accordance with Exhibit M (Benchmarking).”
Relation to MFN: While this is not a direct MFN clause, benchmarking can serve as a proxy to ensure pricing competitiveness, akin to MFN protections.
3. Assignment of Personnel (Section 3.9.2 and Amendment)
- Section 3.9.2 mentions that GlobalLogic shall provide access to its personnel no less favorable than what it provides any other customer.
Excerpt (Section 3.9.2):
“GlobalLogic shall provide Client access to its personnel no less favorable than what it provides any other customer.”
Relation to MFN: This is the closest concept to an MFN found in the agreement—although limited to personnel assignment/access, it creates an obligation of non-discrimination between customers in this context.
4. Benchmarking and Annual Review (Exhibit M)
- Detailed benchmarking rights are provided to the client, allowing review and potential adjustment if fees are found to be out of step with market norms.
Relation to MFN: Although not framed as an MFN, this gives Client a mechanism to ensure competitive pricing.
- No explicit “Most Favored Nation” clause is found in this MSA.
- Related concepts exist, especially in pricing adjustment, personnel access, and benchmarking, but these do not guarantee that Client will always get the lowest prices or best terms offered to others.
- If MFN rights are crucial, a separate and explicit MFN clause should be drafted and added to make this protection unambiguous.
If you are looking to draft an MFN clause to add to this agreement, I can help create a tailored provision. Let me know!
On the positive side, it’s great to see 4o improve. On the negative side, I would find the inconsistency troubling if I was going to use one of these systems on contract review where I needed to be right.
I am a regular user of Generative AI and think it’s often pretty amazing. That said, the technology has improved a bunch but still has more to go. I’m not sure I would yet trust these systems on high-importance contract review, at least based on this very limited review.
Vals gave us a pretty narrow window into the actual performance of the tested legal Generative AI systems and—when I could see the underlying performance, not the narrative about the performance—I found that I would need to see more to trust them, at least on contract review. This is just one question on one document, AI makes mistakes, and it would be great to see more thorough testing here … though, really, testing pure AI accuracy at finding clauses probably isn’t the best measure of AI systems at contract review. But that’s a topic for another (forthcoming) blog post (or several)!
—
AL Note: this article is by Noah Waisberg, the former co-founder and CEO of Kira Systems, and is now the founder and CEO of Zuva.
AL is happy to take this debate forwards, and if you have an in-depth response to Noah’s views, this site would be glad to publish that as well. Hopefully everyone in the legal tech world being open about genAI benchmarks will help lift us all up.