This is a Sponsored Thought Leadership article by Kevin Gidney, Co-Founder and CTO of legal AI, contract discovery and analysis company, Seal Software.
What would a world without rules be like? A world where only past experience drives the outcomes of the future. Could we see this as reality? Just as driving on a road, we have simple clear rules to follow.
Even self-driving cars such as Tesla have clearly defined rules on how to deal with set conditions which might have been recognised by a data-driven model, to select the difference, for example, between a tree and a person standing still. But I am sure, as you all are, that the selection of what to do when determining which one to crash into over the other, is not driven by historical data alone.
With this in mind, it comes as some surprise that several industry commentators are talking about data-driven models winning the war against rules-based systems.
The surprise is not that this view is being shared, but these commentators are looking at it as a binary option. As the example above shows, it’s not a binary option, and we have built this ideology into the Seal platform.
Rules are excellent methods for making clear and concise decisions. Written correctly, you know with a 100% certainty a match to the rule is correct.
Data-driven models on the other hand are not that straightforward. They make predictions based on probability, and you can never be 100% assured that its correct. You can have a high probability it is, but never a guarantee. (The exception to this is if you have managed to create a unicorn model that generates an F score of 1, and where recall and precision are also 1 — but that is practically impossible.)
Another point to consider in the use of rules and models is rules can accurately account for ranges of values, and then take actions based on the range presented. Like comparing, for example, a monetary value to a range of pre-defined acceptable values.
Statistical models do not do this, but simply detect the data in question and predict an outcome based on values provided. When creating statistical models it’s also worth noting that you would need significantly more manually reviewed information than with a rule, as using lesser amounts will create models with very high standard deviations, (see other Seal post here for additional information).
Finally, let’s consider the normalisation of information such as units of measure, days, weeks, months, years, etc. The detection of those items can be done in both methods, and models such as MaxEnt with Conditional Random Fields (MaxEnt and CRF) for example, are able to detect and extract entities such as a measure of time. But, then you need some post processing to normalise the data from text strings into formatted, searchable and actionable data points.
It should be clear that a combination of models and rules generates the best results. One of the simplest ways to see this is that rules for NLP and information detection can be highly precise; if it matches you are guaranteed it is correct. But, the coverage would be low as you are favouring precision over recall. Models, however, generalise and thus you are never sure if the answer is 100% correct, so you are favouring recall over precision. So why not combine them?
Combining methods and techniques in such a way can be referred to as layering, ensembles or neural networks (with hidden layers). As each method or layer performs its given task, it passes its results to a following method or layer. A good example of this was the Netflix prize where different models were merged into an ensemble to produce the winning results. By combining methods, such as rules and models, it’s possible to gain high accuracy and high recall within the shortest possible time.
How does this relate to contracts and contract review? Think of two methods, the rules that you know are 100% accurate when matching, and a model that gives good recall, but without reviewing all the data you are not sure if they are correct.
In this case, if items match the rule and the model, you can be sure they do not need review, as you know exactly, based on the rules match, that they meet the set criteria. This method allows users to dramatically reduce the amount of review and to have the best possible recall and precision.
It is this combination of methods, models and layers that we have used within Seal from its inception. Taking the best and most appropriate methods to accomplish a given task, and then combining those for a best-of-breed (ensemble) solution for information extraction and normalisation.