AI Beats Human Lawyers in CaseCrunch Prediction Showdown + DATA UPDATES

Many readers will know by now that UK-based legal tech start-up CaseCrunch challenged lawyers to see if they could predict with greater accuracy the outcome of a number of financial product mis-selling claims. The showdown has now been completed with the results announced at the offices of insurance law firm, Kennedys, last night (27 Oct).

As explained in the statement below, CaseCrunch’s* predictive algorithms and modelling of legal issues came out on top, scoring almost 87% accuracy in terms of predicting the success or failure of a claim. The English lawyers who were beaten got overall an accuracy level of around 62%.

Also, at the end of the piece are some additional data on how the experiment was conducted, which was released today, (30 Oct.)


CaseCrunch is proud to announce the results of the lawyer challenge. CaseCruncher Alpha scored an accuracy of 86.6%. The lawyers scored an accuracy of 62.3%.

Over 100 commercial London lawyers signed up for the competition and made over 750 predictions over the course of a week in an unsupervised environment.

The problems were real complaints about PPI mis-selling decided and published by the Financial Ombudsman Service (FOS) under the Freedom Of Information Act. (N.B. Payment Protection Insurance, or PPI, has been a massive issue in the UK, with banks having to pay customers billions of pounds in total refunds for making consumers take on insurance products they never required.)

The main reason for the large winning margin seems to be that the network had a better grasp of the importance of non-legal factors than lawyers. We will publish a research paper explaining the approach we took and the broader significance for legal theory soon.

Evaluating these results is tricky. These results do not mean that machines are generally better at predicting outcomes than human lawyers. These results show that if the question is defined precisely (such as – was this complaint about PPI mis-selling upheld or rejected by the FOS?), machines are able to compete with and sometimes outperform human lawyers.

This experiment also suggests that there may be factors other than legal factors contributing to the outcome of cases. Further research is necessary to establish this proposition beyond the specific parameters of this experiment.

The use case for systems like CaseCruncher Alpha is clear. Legal decision prediction systems like ours can solve legal bottlenecks within organisations permanently and reliably.

We are very thankful to the legal community for their enthusiastic participation and to our sponsors for their generous support. We also want to thank our technical judge, Premonition, and our legal judge Dr. Steffek, for their involvement.


New Data (released 30th Oct)

Overview:

Throughout October, 112 lawyers pre-registered to participate in the Lawyer Challenge, which ran from 20th-27th October. They were presented with factual scenarios of PPI mis-selling claims, and asked to predict “yes or no” as to whether the Financial Ombudsman would succeed in the claim. The same factual scenarios were given to CaseCrunch: whoever had the highest accuracy, won. 775 predictions were submitted by the participants.

A Technology Judge and a Legal Judge independently verified the fairness of the competition. The Legal Judge was Felix Steffek (LLM, PhD), University of Cambridge Lecturer in Law, and Co-Director for the Centre of Corporate and Commercial Law. The Technical Judge was Ian Dodd, UK Director of Premonition.

The factual scenarios were real decided cases from the Financial Ombudsman Service, published under the FOIA. All identifying details – such as the name of the parties, case names and dates – were removed, leaving only the facts.  Lawyers completed their predictions in an unsupervised environment, and were permitted to use all available resources. PPI mis-selling was chosen as the basis of the competition because it matched the background of most lawyers taking part in the challenge and is an area of law that is easier to learn about than others. Participants were given links to the Financial Conduct Authority’s rules detailing the basis of an Ombudsman’s decision.

Dr Steffek said: ‘The factual descriptions of the problems set by the Financial Ombudsman Service are a reasonable basis for a prediction about PPI mis-selling complaints being upheld or rejected by the Ombudsman at an early stage in the advisory process. Trained lawyers from commercial London law firms, using all the tools and resources they usually work with, are able to make reasonable predictions about these problems at this point even though the information given per claim varies and further information might be revealed at later stages.’

Participants:

112 Lawyers competed in the Challenge, ranging from Magic Circle Partners, barristers, and in-house counsel. Participating law firms include: Bird & Bird, Kennedys, Allen & Overy, Berwin Leighton Paisner, DLA Piper, DAC Beachcroft, DLA Piper, Weightmans and more. “Teams” were entered from large firms including Pinsent Masons and Eversheds Sutherlands.

Results:

The lawyers scored an accuray score of 62.3%. CaseCruncher Alpha (the system entered into the competition by CaseCrunch) scored a validation accuracy of 86.6%.

Ian Dodd, UK Director of Premonition concluded:

‘The session I observed produced an accuracy of 86.6%. It would also be interesting to put a £ value on the processing cost. The real number of: “Human: 62.3% at £300p/h and X hours” compared to AI:86.6% at £17ph and X hours” is the true bottom line.’


[*N.B. for readers who are wondering who CaseCrunch are, they were formerly Elexirr, then before that, LawBot.]