Finding the ‘Needle in the Haystack’ with AI – Bloomberg Law

Finding the ‘Needle in the Haystack’ with Artificial Intelligence

By David Kleiman, Bloomberg Law

For attorneys, especially junior associates, finding relevant information is a daily – and even nightly – activity.

Associates often pull all-nighters trying to find that ‘needle in a haystack’, that perfect case that supports their legal argument, or that EDGAR filing that applies to their deal. Being a former litigation associate myself, I can tell you I spent many sleepless nights worrying that I missed an important case in a research memorandum submitted to a partner, or worse yet, a motion submitted to court.

In today’s legal world, this worry of not finding the right information is exacerbated by the insurmountable amount of online legal information – court opinions, agency materials, statutes, regulations, books, practice guides, law reviews, legal white papers, news – the list goes on and on. Information overload has resulted in a lack of confidence in the legal research process, often leaving attorneys unsure of whether they have found the right information – rendering the application of artificial intelligence all the more important.

In my previous article, Demystifying AI for Lawyers: Supervised Machine Learning, I discussed how supervised machine learning, a subset of AI that utilizes both technology and human expertise, can help attorneys gather insights from large sets of data. I also highlighted how Bloomberg Law’s Smart Code℠  and Points of Law products have harnessed this technology to greatly enhance the legal research process by surfacing relevant documents and legal information faster than ever before.

In this article, I want to talk about another subset of AI that is often overlooked when discussing the application of AI in the legal industry – Natural Language Processing (NLP).

What is Natural Language Processing?

Natural Language Processing is a form of artificial intelligence that enables computers to analyze, understand, and derive meaning from human language. Stated another way, it is the ability of a computer program to recognize and interpret human linguistic patterns and tendencies so that users can submit queries using their natural language instead of ‘computer speak’. In the legal industry, this means large amounts of research and data can be queried using common language spoken in the legal world.

For an NLP-based solution to be effective it must be tailored to the specific application. Given that the words, phrases and entities used in the legal industry have very specific meanings, a NLP-based solution developed for attorneys must account for those meanings and legal concepts.

As an example, Bloomberg Law’s NLP-based search system was developed, in part, by measuring performance against relevance judgments obtained from a panel of legal analysts, as well as by viewing the actual interaction of users with the search system. This evaluation helped the Bloomberg Law search team identify where the system did and did not perform well. Using that information, the team designed system improvements to better understand what attorneys found relevant and address the underperforming queries.

Bloomberg Law’s NLP system also can parse search queries to identify the entities contained within it and assess the relationships between those entities. An ‘entity’ for this purpose could be a statute or regulation, a person, a court or even a legal concept (i.e. strict liability). Documents discussing those entities are then identified and scored based on how prominently the entities are featured and how they are related in the document.

NLP for Legal Intelligence

As one can imagine, given the amount of legal data, it can also be challenging to accurately identify and disambiguate names of persons and organizations appearing in legal documents. Being able to do this well is important to returning the right information.

For example, in many instances the name of a law firm as mentioned in a legal document may not match the ‘official’ name of the firm. The name appearing may be an informal version, a misspelling, or even an outdated name. ‘Skadden Arps Slate Meagher & Flom LLP’ may be shortened to ‘Skadden Arps Slate Meagher & Flom’ or just ‘Skadden Arps’.

The ‘LLP’ designation may be removed or expressed as ‘L.L.P.’ In this example, the relatively unique names ‘Skadden’ and ‘Arps’ provide clues to the identity of the firm being mentioned, but in other cases, such as the law firms of ‘Steptoe & Johnson LLP’ and ‘Steptoe & Johnson PLLC’, more contextual clues such as addresses may be needed in order to determine which firm is being mentioned

While no single method can perfectly disambiguate names all the time, NLP techniques such as named entity recognition and named entity disambiguation – two very active areas of research and practice in the NLP community – can help identify the correct names and return the associated documents an attorney is looking for.

In today’s data rich world, litigators, transactional lawyers, and corporate counsel alike need to understand which tools and what technologies will best enable them to sift through vast amounts of data to find the one or two critical documents they need.

Although many attorneys will never feel confident in their research, they may sleep easier knowing that their research tools are employing technologies such as NLP and machine learning to help them find that ‘needle in the haystack’ they are looking for.

(Artificial Lawyer is proud to bring you this sponsored thought leadership article by Bloomberg Law

Learn more about Bloomberg Law at and follow on Twitter @BloombergLaw.

About the Author: David Kleiman is a product manager for Bloomberg Law, where he focuses on developing innovative workflow tools and features for litigators. Kleiman was a litigator at Arnold & Porter and clerked for retired United States District Court Judge Shira A. Scheindlin. He earned his JD from Brooklyn Law School and BA from Binghamton University.