Contract AI’s Reliability Problem: When AI Gets It Wrong

By Pedram Abrari, CTO, Pramata.

This is the second article in our three-part series exploring why loading contracts into large language models isn’t an effective way to achieve accurate contract intelligence. We’ll dig deeper into why this is, and why legal teams need Contract AI that’s built to address the nine major technical challenges of using Generative AI with contracts.

In the first blog of this series, we covered the foundational challenges that prevent ‘Contract GPT’ from working in enterprise environments: security risks with traditional RAG systems, data quality issues, and the complexities of context window management. These challenges often stop AI implementations before they even get started.

While fundamental, the first three technical challenges aren’t the end of the story, by far. In this article, we’ll explore the next three challenges, which can be summed up as reliability problems. Learn why AI often gets it wrong and how the proper technological innovations can solve these issues.

Contract AI challenge 4: AI hallucinations

If you’ve used one of the publicly available large language models you’ve likely experienced AI ‘hallucinations’ firsthand. This is when AI confidently provides information that’s completely wrong. In casual conversations, hallucinations might be amusing; within the context of business and legal work, they’re risky and potentially catastrophic.

A few ways in which hallucinations may show up when you try to use AI for contract creation, redlining, management, or analysis include:

Entirely made up contract terms
Mixed-up responsibilities between parties
Phantom dates and numbers including renewal deadlines, payment amounts, or performance metrics
Fake citations and cross-references

The scariest part about hallucinations in general, and when they occur in Contract AI specifically, is that they often appear plausible. Without manually checking every output the AI provides, it can be impossible to identify them for what they are.

Pramata’s solution: Structured validation and quality controls.

Pramata minimizes hallucinations through multiple, coordinated, and patent-pending technological approaches, including:

Relationship Object Model: Rather than relying solely on free-form text generation, Pramata implements structured data architectures that constrain AI outputs into predefined formats and schemas.
Thought Process Support: Pramata requires its Contract AI to document its thought process step-by-step when analyzing contracts during the reasoning stage.
Context Window Overflow Monitoring: Pramata actively monitors context utilization during processing, alerting users when the AI is approaching thresholds that might impact reliability.

These are just a few of the 15+ ways Pramata drastically minimizes AI hallucinations, which means you can count on the results from your Contract AI.

Contract AI challenge 5: Variability and reproducibility issues

Somewhat adjacent to the issue of AI hallucinations is that of reliability, consistency, and reproducibility in Contract AI. If you ask a person to look through a contract for a piece of information, you expect them to provide the correct answer each time they look, even if they don’t word the answer exactly the same way each time. Similar to a human, Gen AI has the ability to produce unique (yet still correct) answers. Unfortunately, it can also produce varying and incorrect responses.

The most significant problems that come from Gen AI’s variability are:

Inconsistent contract interpretations
Unpredictable outputs
Difficult quality assurance
Risk of contradictory guidance

At an enterprise scale, often dealing with thousands or tens of thousands of contracts, this variability is a deal-breaker. Legal teams need to know that Contract AI will consistently identify the same renewal dates, extract the same payment terms, and flag the same risks regardless of who runs the analysis or when they run it.

Pramata’s solution: Standardized processes.

Unlike the results you would get if you fed your contracts into ChatGPT, Pramata uses proprietary technology to ensure reliable, consistent, correct answers each time someone asks a question about your contracts.

Pramata does this through the following innovations:

TrueDoc OCR: Unlike traditional OCR, Pramata’s TrueDoc OCR technology precisely captures complex contract elements including tables, multi-page clauses, and intricate layouts that regular OCR often misses or can’t process
Extract & Enrich: Pre-tags key contract clauses so the system doesn’t need to rediscover clause types repeatedly when analyzing thousands of contracts
Templates & Playbooks: Help our Contract AI standardize contract risk assessment criteria and guidelines, providing consistent benchmarks for AI to evaluate contract positions against organizational policies and risk thresholds
Contract Families: Maintains contract relationship precedence, enabling AI to determine effective terms across multiple related contracts
Multi-LLM Support: Allows the platform to leverage various LLMs based on the specific task requirements. This optimizes AI performance and accuracy.
Pramata Prompt Language: Our patent-pending Pramata Prompt Language uses expert designed prompts that target the precise, most relevant parts of your contracts instead of requiring the AI to cull through every word of every contract.
AI Agents (and Control Flow): Pramata’s targeted approach means the AI can focus its processing power on analysis rather than discovery.
Context Window Overflow Monitoring: Pramata actively monitors context utilization during processing, alerting users when the AI is approaching thresholds that might impact reliability.
Few-Shot Prompting Support: Utilizes a technique that provides the AI with a small number of clear examples of expected inputs and outputs.
Thought Process Support: Pramata requires its Contract AI to document its thought process step-by-step when analyzing contracts during the reasoning stage.

Each of these technologies have been engineered, tested, and proven to produce reliable and repeatable responses – even when dealing with tens-of-thousands of contracts or more.

–

**Download the full Pramata report here.**

–

Contract AI challenge 6: Doing it at Enterprise scale

We don’t recommend it, but if you really wanted to, you could put a few contracts into the LLM of your choice and then ask questions about the contents of those contracts. And the Gen AI might get it (mostly) right…for a little while.

But enterprise organizations don’t analyze one or two contracts at a time. They need to process thousands of contracts quickly, consistently, and cost-effectively. That’s where generic AI approaches completely break down because:

Running contracts one-by-one through standard AI interfaces is prohibitively time consuming.
Maintaining consistent analysis quality across thousands of contracts, including those on third-party paper, requires sophisticated approaches impossible to achieve by feeding contracts into generic AI models.
The value large organizations get from AI-driven contract analysis depends on the ability to analyze and find patterns across a large number of contracts.

The reason most organizations hit a wall when trying to scale generic AI for contract analysis is the system becomes too slow, too expensive, or too unreliable to achieve real business value.

Pramata’s solution: Purpose-built architecture for enterprise scale.

There’s a reason Pramata is Enterprise Grade Contract AI that actually works. It was designed from the ground up for enterprise-scale contract generation, redlining, management, and analysis. Pramata’s patent-pending technology brings the speed, accuracy, and reliability organizations and their legal teams need to get real business value from Contract AI through:

Extract & Enrich: Pre-tags key contract clauses so the system doesn’t need to rediscover clause types repeatedly when analyzing thousands of contracts
Multi-LLM Support: Allows the platform to leverage various LLMs based on the specific task requirements. This optimizes AI performance and accuracy.
Pramata Prompt Language: Our patent-pending Pramata Prompt Language uses expert designed prompts that target the precise, most relevant parts of your contracts instead of requiring the AI to cull through every word of every contract.
AI Agents (and Control Flow): Pramata’s targeted approach means the AI can focus its processing power on analysis rather than discovery.
Scalable Agent Processing & Reporting: Pramata’s agent-powered reporting capability transforms individual AI analyses into standardized, tabular outputs that maintain consistency across the enterprise.

When you need reliable, accurate, consistent results from your Contract AI on a large scale, it’s important to pay attention to these capabilities. Without the types of technology Pramata has developed to ensure our Contract AI performs with the highest degree of accuracy, using generative AI for contacts can open up your business to more risk than it’s worth.

The technical challenges don’t stop here.

With six of the nine largest challenges covered, there are still more reasons why loading hundreds of contracts into an LLM alone won’t produce the results attorneys and legal teams require. Make sure to come back for part three of this series, or download Pramata’s entire whitepaper ‘9 Major Technical Challenges that Come With Using Generative AI for Contracts’ now.

—