A Norwegian Legal Challenge: Using AI to Anonymise Court Docs

The number of uses for legal AI applications continues to grow and one team in Norway is now using this technology for something Artificial Lawyer has not seen before: removing people’s names from public court documents by making use of natural language processing (NLP) and machine learning.

But, you may ask: ‘Why not just do a word search, surely that would suffice?’ But, apparently, it’s not that easy. In a globalised Norway with people from all over the planet it’s not sufficient to just do a search for the most popular traditional names. Moreover, forenames and surnames get misspelt in court records and don’t always get picked up even if you know what you’re looking for.

And, if by mistake a name in a sensitive case was published, perhaps buried deep in a long court decision, those involved in that publication would have broken the law. Something therefore had to be done, at least if court records were to be made public and a very laborious process of manual checking could be sped up.

LovData, which is a private foundation created in 1981 by the Ministry of Justice and the Faculty of Law in Oslo has been trying to solve the problem. It has been using a variety of technolgies and is now hoping that AI techniques, such as NLP and machine learning, will crack the problem.

Artificial Lawyer spoke to Tor Henning Ueland a Senior Developer at LovData* about this challenge.

>>> Why do you need to remove names from court papers in Norway? 

Court decisions are public information per se. However, for criminal cases, cases involving minors, family cases involving married or divorced parties and disabled persons, we have statutes that prohibit us from publishing the court decisions un-anonymised.

In addition, Norwegian statutes on privacy prohibit us in general from publishing decisions with sensitive personal data un-anonymised. If the court consider it impossible to publish a court decision without identifying the involved parties, it will prohibit us from publishing it.

>>> Can you tell us about your company and its focus on legal AI?

We are directing more and more resources towards AI and we are working on multiple products, both for in-house use and products for our users. This includes, text mining, classifiers and other NLP solutions.

Our main solution used today is an in-house solution written in Fortran back in 2003. Until that time, anonymisation was a 100% manual job. We will replace all Fortran-solutions with new in-house Java solutions [and] we are testing different AI technologies, which might help us with this.

>>> When did this start? 

Since the mid-20th Century, journals with court decisions have been anonymised. It started with the publication of decisions from the Supreme Court back in early 1940s, where decisions regarding sexual assaults first became anonymised.

The following years, an increasing number of published court decisions became anonymised.  The last published criminal case not anonymised in journals states back to 1977. The Lovdata foundation was founded  1981. We have been anonymising criminal cases and decisions regarding minors from day one.

>>> How successful have you been so far? What are the difficult parts? 

Our current solution, based on lists with names, is challenged when it comes to picking up all kinds of misspellings of names and names that are not already added to our lists.

Our lists include all citizens registered in Norway. However, we receive an increasing numbers of court decision involving non-Norwegians, and it is laborious to find  all these names in court decisions.

For a while we have been experimenting with NER software to extract names not recognised by our current in-house solution. Recent test have been promising, but due to our high quality demands we are not quite ready to implement this solution.

[Also] being a small language, none of these models are pre-trained to understand Norwegian. Having to train the models before testing them has been a time consuming task.

>>> Sounds like a fascinating project. Good luck!

[* Fact for the day: If you were wondering why a legal information company is called ‘Lov Data’, as I was, it’s because the Norwegian word for law is lov. Which also means that the Norwegian term ‘lover‘ actually means ‘laws‘. Yep, I thought was interesting too. ]