A Guide to Evaluating Contract Analysis AI Solutions

By Noah Waisberg, CEO & Co-Founder, and Dr. Adam Roegiest, VP Research & Technology, Zuva

More and more applications are incorporating contract analysis AI features. Once upon a time, software builders looking to add contract AI needed to DIY this tech. This has led to huge amounts of time and effort invested across companies trying to do the same thing, often achieving mediocre results. It turns out to be difficult to build good contract analysis AI. Noah learned this the hard way himself, once upon a time. Today, there’s another option: integrate pre-built contract analysis AI from another vendor. That leads to two big questions:

Should you build your own contract analysis AI, or integrate a third party’s? (Or, if you can afford it, there’s a third choice: buy a company that has already built a contract analysis AI.)
If you decide to integrate a third party contract analysis AI, how should you evaluate the different vendors?

In the interests of brevity, this piece primarily covers the second question.

Why Are We Credible On This Topic?

Noah Waisberg has been in the contract analysis AI market since its early days. He co-founded Kira Systems over a decade ago and helped build it into the dominant contract analysis AI vendor (with 18 of the top-25 M&A law firms and parts of all the Big 4 as customers at the time it was acquired by Litera). Today, he’s the CEO of Zuva, a company spun-out of Kira Systems.

Zuva’s DocAI product (which is built off of the machine learning engine started at Kira) enables software developers to embed pre-built contract analysis AI models into their applications via an API. Dr. Adam Roegiest holds a PhD in computer science from the University of Waterloo, where he worked with eDiscovery experts Dr. Gordon V. Cormack and Dr. Maura R. Grossman. He played a key role building Kira’s ML engine and today – among other responsibilities – leads Zuva’s Research Team. Of course, while Noah and Adam have a fair bit of experience in the contract analysis AI space, they are not neutral observers. So take their thoughts accordingly. (Sorry for this being third-person – it annoys us too!)

Buy or Build?

In our experience, having been working on this problem since 2011, it’s really hard to build good contracts AI. If you invest the time and resources into building this tech, there’s a good chance you won’t get it to work as well as you (or your customers) would like. This would put you in good company: Gartner predicts that through 2022, 85 percent of AI projects will deliver erroneous outcomes due to bias in data, algorithms or the teams responsible for managing them. Off the shelf AI provides the ability to buy the underlying tech that powers this functionality and build on top of it to deliver a unique and differentiated end user experience to your customers.

From the numerous conversations we’ve had with prospects and customers who have contemplated this key question, their decision to buy (vs. build from the ground up) contracts AI is based on three key factors:

The tech has to work. It has to work reliably, and consistently to ensure that it can meet the needs of customer use cases and deliver certainty for success.
Ease of implementation. The faster this technology can be embedded into existing workflows the better. Delivering a fast time to value is of critical importance.
Limited resources. Accessing low code/no code features that eliminate the need for data scientists and other technical experts is a significant value driver. If you have data scientists or other machine learning experts, it means you can focus them on building differentiated machine learning features (as opposed to contract analysis AI features that your competitors built more easily (and probably better) than you). Basically, focus on where you can add value.

This piece goes into a lot more detail on whether to build or buy contract analysis AI.

Let’s turn to how to evaluate different third party contract analysis AIs.

Who are the Primary Integratable Contract Analysis AI Providers?

At this time, the two primary integratable contract analysis AI providers are Google Contract DocAI and Zuva DocAI. There is also ContraxSuite by LexPredict (an open-source offering), as well as technology from Microsoft and AWS that can be used here. Historically, IBM Watson had a contract analysis offering.

Wait, but what about the horde of other contract analysis AI vendors?! Seal, eBrevia, Kira, LegalSifter, Eigen, Luminance, Heretik, Della, ContractPodAi, Evisort, Linksquares, and so many more. Why aren’t they listed here? They aren’t included because they are primarily workflow tools that incorporate contracts AI, as opposed to integratable contracts AI offerings. While some may have APIs, our experience suggests that anyone trying to build another piece of software on top of them may be in for a suboptimal experience; this isn’t what they’re built for.

In contrast, an integratable contract analysis AI (like those mentioned in the paragraph above) exists to be incorporated into other (often workflow) systems. In fact, in coming years we expect that most workflow contract analysis tools will come to embed a third party integratable contracts AI (as opposed to a homegrown AI, as is generally the case today).

Many of the items discussed in the rest of this piece would also be useful for evaluating the AI in contracts workflow solutions. However, these solutions should also be evaluated on the basis of their workflow features.

How to Evaluate Third Party Contract Analysis AI Offerings

Here are factors to consider when evaluating contract analysis AI offerings.

Pre-built Models

A valuable element of contract analysis AI offerings is that they come pre-trained to find information in contracts. This means that customers can find information they care about from Day 1, all without having to run an ‘annotation factory’ of people instructing the software to find new information. In our experience, this takes a lot of work and significant expense. Some offerings come with tens to thousands of out-of-the-box models, which can be a very valuable feature for certain customers.

Here are some things to consider as you evaluate the offerings’ pre-built models:

Do the provided models find what you need them to find? For planned future use cases too?
- How accurate are the pre-built models?
  - Do their measures of accuracy and effectiveness align with your own?
  - Will these accuracy measures be helpful in designing systems that are usable and predictable? Do they correlate well with your designed user experiences?
  - How should you measure accuracy? This is itself worthy of a long post. For the moment:
    - Documents you test on should ideally approximate the diversity and difficulty of ones your software see. E.g., NDAs or credit agreements tend to be easier (partly because they are more homogenous; though note there can be significant differences even in these across jurisdictions), supply agreements, IP licensing agreements, and leases can be harder.
    - If your system may ingest scanned documents, make sure to test accuracy on these. Poor quality scans can be difficult for AI systems to accurately find information in.
    - Some provisions are harder than others. E.g., governing law, term, indemnification, and assignment tend to be pretty easy. Exclusivity, non-compete, change of control, and MFN can all be exceptionally hard to build accurately.
    - Harder tasks can be a window into how an AI system performs in non-standard situations.
    - We recommend against giving a vendor documents you will test on in advance. While current vendors in this space seem unlikely to cheat on a test, doing this makes cheating possible.
- Depending on the envisioned use case, one might prefer a model that makes fewer erroneous extractions at the cost of missing some information (e.g., in legacy contract imports). Alternatively, one may be more tolerant of erroneous extractions if they want to be sure all the data points are found (e.g., in due diligence cases).
Do the provided models extract just text?
Do the provided models extract more than text (e.g., structured outputs)?
- Are these normalized values (e.g., dates, currencies)?
- Can you do classification on text extractions (e.g., whether the agreement auto-renews if there’s a renewal clause)?
Are there models that identify the type of agreement or contract (e.g., NDA vs employment agreement vs supply agreement)?
Do you know what the provided models were trained to find (i.e., the scope of the model)?
- While subject matter experts might know what a particular clause means (e.g., change of control, indemnity), there can always be some disagreement about what exactly should be captured by a human, let alone AI.
  - Our research found lawyers typically agree on the same area of a document for a clause but will disagree on the amount of content to annotate.
Do you know what jurisdictions the models were trained on?
- Terms of art can be jurisdiction specific and rely on nuances that are specific to the laws in a particular state/country. This can mean that a model trained only on US documents may fail to identify nuances in UK documents. E.g., a US agreement might use ‘lay-off’ to express what would be called a ‘redundancy’ in the UK.

Trainability

Contract analysis AIs may not come pre-trained to find everything you need. Or maybe they do, but you may need to find additional information in the future. How much effort it’s going to take to train the software, and by whom are important considerations. Also important: can you incorporate contract AI training into your own app, or do you need to always do it through the AI vendor’s system. In our experience (having done it both ways), the more you can put AI in your users’ hands, the more scalable it is.

Some key questions as you evaluate how trainable a providers’ software is:

How much data does it take to train a good quality new model?
Do technical personnel need to be involved in training, or can subject matter experts do this directly?
How easily is the AI trainable for things I need it to know? Does it learn okay in languages I need it to work on?
Can I build AI training features into my own interface (meaning that my users can train the AI directly) or can only my personnel train the AI?
If I train models, who owns them? Conversely, can I share models if I choose to? (See also “Data privacy and security considerations” below)

Solution Speed

‘Speed’ matters, but can mean many different things in this space. Take time to figure out what it means for you.

Determine whether you care about single document speed or document throughput.
- Single document speeds means that when a given model is used against a document, it happens as quickly as possible (useful for one-off requests).
- Throughput dictates how quickly a large bulk of requests will process in the system (useful for bulk processing).
- Most pre-built solutions will optimize for one of these as optimizing for both often requires application specific knowledge.
- Testing potential solutions (e.g., single documents vs tens vs hundreds) and reading documentation will help you determine which approach a solution takes and whether it fits for your expected processing flow.
- You may need custom constraints (e.g., a mix of the two depending on user actions) and that may be an indicator that you may need to build a solution yourself or use a lower level solution (similar to the underlying machine learning engine powering document AIs).

Developer Speed & Experience

Developer time is a precious resource. How can you use it most efficiently?

What does the developer experience look like?
- When building from the ground up, your development teams can create software that is easy for them to use and maintain but also slows down building the end user experience (i.e., a lot of time is spent building the behind the scenes services).
- When buying AI solutions, your hope is that this allows your developers the speed boost to focus only on the end user product. But this only happens if your developers can easily understand and use the solution that you’ve bought. Before buying consider having your development leads look at :
  - The APIs and whether they seem easy to use.
  - The documentation and whether it provides sufficient detail to understand what happens if something goes wrong.
  - Whether there are code examples that help guide best practice use.
How robust is the offering? Almost all software has bugs, but how many and how serious are important distinctions.
- While this can be hard to determine until developers have spent decent time with a solution, it is a motivator to build frameworks around the solution that allow you to swap out a bad one.

Cost

There are two primary costs to think about, whether you choose to buy or build an AI solution: build/development cost and running cost.

First is the development/engineering cost. If building an AI solution takes only 6 months (and that’s unlikely) for a development team of 10 (including data scientists, product management, software developers, data annotation) then you’re very easily looking at $500,000 to $1,000,000 of spend just to build the solution that you still need to build an end product for (assuming you can even hire and retain the right talent). This is in contrast to buying an AI solution where a smaller team will typically just need to focus on integrating the solution into the final product which ideally will take less time than building the entire AI solution.
The second cost is the cost to run the AI solution. When building an AI solution, you have to ensure that its performance scales well with increased usage and that you have resources available to facilitate this (e.g., ensuring you have necessary computing hardware, and staff to monitor the infrastructure). While some maintenance costs can be offset using cloud services, often you’ll still need individuals to monitor these services and ensure uptime of the solution. Conversely, in buying an AI solution, you’re often just paying for how much you use the solution once you get it integrated into your end product. Depending on the nature of the solution and your intended use case, this could be very cheap or very expensive. This is where having a good idea of how well existing solutions fill gaps will help inform the trade-off between buying and building.

Data privacy and security considerations

When you build an AI solution, you understand (or someone in your organization does) all the data retention and storage policies involved and their intersection with any compliance structures that you have in place. This makes it easier to guarantee to your clients how their data is stored, used, and deleted.

When buying an AI solution, you lose some of this control and require documentation from your solution provider. Consider asking questions like the following:

Can I control in what geographic region data is stored?
Can I control for how long data is stored?
Can I control how data or if data is used in training any part of the AI solution (e.g., in not training a global model for use by all customers of the AI solution)?
Who has access to the data I provide?
Depending on your domain, you may want to ask which compliance standards they have met (e.g., SOC-2 Type 2, HIPAA)?
If you are willing and able to share models (or even training data), how does the vendor protect the confidentiality and privacy of the underlying (likely, customer) data? Do they use techniques, such as Differential Privacy, or do they rely solely on controlled access to prevent untoward use of a model?

Available hosting options

One of the biggest considerations when choosing to buy an AI solution is how the solution will be managed and hosted.

By allowing the solution provider to manage and host the solution, you mitigate the overarching costs of monitoring and support. The downside is that this means you are at whim on wherever that solution provider hosts their solution (e.g., AWS in the US, Azure in Europe).
When a solution provider allows you to host and manage the solution yourself, you can choose what cloud and what geographic area to host it in. This does mean you incur management and hosting costs but this may be beneficial to you if you have an internal datacenter or existing infrastructure.
Note that not all AI solution providers will provide the option to choose between cloud or on-premises hosting. This can be due to their system design or tight coupling to other cloud services.
Obviously, when you build it yourself, you get to choose and manage all of these things yourself which may or may not be cost effective.
Some vendors (even (especially!) those providing cloud based software) have security policies that prevent them from integrating software that they do not host themselves. If this is your situation, make sure you can use the software early on in your evaluation process.

Roadmap

Many people commit to SaaS tech products partially because of where they are, partially because of where they think the product (and the company behind it) is going. If integrating third party tech into your application, do you think you’re making a solid move, at least over the medium term (if not beyond)?

How much visibility do you have into the vendor’s roadmap?
Is the roadmap aligned with development you think is important?
Is the vendor focused on the area?
Is the vendor likely to remain committed to the area and work to solve problems you care about?

Conclusion

As you think through the above criteria, and how they align to your needs, you can start to formulate a checklist to understand how to evaluate the offerings in the market. Finding a solution that delivers on the above outcomes will help companies to realize a faster time to market with less development resource investment to deliver the features to their customers that will meet and exceed their expectations, ultimately achieving an overall better customer experience.

—

[ This is an educational guest post written for Artificial Lawyer by Noah Waisberg and Dr. Adam Roegiest at Zuva. ]

Share this: