AL Interview: ZyLab’s 30 Year Journey Through eDiscovery + AI

Artificial Lawyer recently caught up with Johannes C. Scholtes, Chairman and Chief Strategy Officer of Dutch ediscovery company ZyLAB and his team, including Matthijs Luisman, Project Advisor. We discussed how the Amsterdam- (pictured above) and US-based company has been moving through successive iterations of data analysis technology for over three decades; how they are using AI tools; and the thorny question of what AI really means in the context of ediscovery.

ZyLAB is over 30 years old – which is old for a legal tech company. Could you tell the readers about the journey to today?

Over the past 30+ years ZyLAB has worked closely with corporations, law firms and governmental agencies to deal with regulatory requests, high frequency eDiscovery, M&A, contract discovery and review, FOIA and Public Records Requests, investigations and audits.

Johannes C. Scholtes, ZyLab

During this journey, we have reinvented ourselves several times and offered different products and services to these target groups. However, over the years, managing unstructured information has always remained a problem, especially since the amount of information is now 100,000 times larger than 30 years ago and the number of formats and locations where such information resides has grown exponentially as well.

Additional requirements from regulatory agencies and ever-changing laws require continuous investment to stay competitive in our markets.

The Data Science R&D department of ZyLAB has close connections with the Department of Data Science and Knowledge Engineering of the Maastricht University and other top universities around the world.

[ For example, Johannes C. Scholtes has been a full professor in the Artificial Intelligence Department (Text-Mining chair) for 10 years now. ]

The focus of ZyLAB’s data science team is to make eDiscovery smarter, better and faster by using intelligent search, information extraction, topic modeling and machine learning.

Over the years, ZyLAB has used this knowledge and continuously improved its technology to help organisations to quickly and easily get control of vast amounts of data.

A core of the offering is ediscovery, but with AI capability. Can you please explain what this AI capability is and how it works?

Machine learning, technology-assisted review (TAR), text mining, predictive coding, and data analytics are all examples of AI-techniques. These techniques have made eDiscovery smarter, faster and easier, as well as drastically reducing review costs.

Using computers for Early Case Assessment (ECA) and TAR has huge benefits in eDiscovery: computers are faster, more accurate and consistently outperform human reviewers. By combining information extraction, and clustering machine learning and data visualizations, these tools can provide insight into correlations, patterns, trends and other important information. The same techniques are also used for auto-redaction and auto-pseudonymization.

TAR is based on machine learning and the semantic representation of text. eDiscovery solutions like ZyLAB ONE eDiscovery, use these techniques for automatic text and document classification to accelerate the review process by reducing the number of documents that need to be reviewed manually.

OK, here is a tricky question: a lot of e-discovery systems are described as using ‘AI’ although it’s not always clear if they are doing more than ‘key word’ search review. How would you define what an AI-driven e-discovery system looks like? What are the key attributes that make it an AI system in your view?

In the eDiscovery process, an AI-driven solution should be used in three main areas:

  • Search and Analysis: the computer is used for human-like communication skills such as reading (Optical Character Recognition), dealing with speech (Audio Search) and with other languages (Machine Translation).
  • Early Case Assessment: advanced analytics and visualizations for fast and clear insights early in the process.
  • Review accelerators such as Technology-Assisted Review or auto-redaction and auto-pseudonymization to speed up the most expensive part of eDiscovery.


You are based in Holland, but your system is language agnostic. Can you explain how it can be agnostic? Why is there no need to train in Dutch? Or English?

Most of our clients are from outside of the Netherlands. The US market is our main market. We have dual headquarters: one in McLean Virginia and one for EMEA in Amsterdam, the Netherlands.

For more than 20 years, our systems have supported over 400 languages for search and basic tasks such optical character recognition (OCR). Many of our data science techniques are language agnostic. We are able to do this because the algorithms are based on statistical models instead of linguistic models. This applies to all our topic modeling and machine learning algorithms.

There are of course language dependencies in the more advanced functionalities of our system (such as machine translation, audio search, and text-mining), but we have taught these to our systems by training the algorithms with relevant data for each language we support. We can deal with different languages because we can recognize the language of documents automatically and then apply the right statistical models.

You mentioned that your clients are from around the world, where are most of ZyLab’s clients and what kind of company/law firm are they?

We have clients all over the world, but our biggest markets are the US, UK, and Benelux. ZyLAB’s products are trusted by Fortune 1000 companies, government agencies, courts, regulatory agencies, and law firms worldwide and has nearly 10,000 installations and 2 million users.

Some of our marque clients are all the UN War Crimes Tribunals (Rwanda, Sierra Leone, Yugoslavia, Cambodia, …), the EOP of the White House, EC-Anti-Fraud department OLAF, FTC and many other high profile clients.

Legal AI is growing around the world, how do you see adoption of AI systems in the Dutch legal market, not just e-discovery tools, but transactional doc review and other uses?

The use of AI-techniques to speed up legal processes is quickly gaining acceptance in the Netherlands. I don’t think there are significant differences with the rest of the world. Like everywhere, security, defensibility, transparency and compliance are key in business critical processes like eDiscovery, answering regulatory requests, dealing with public disclosure acts, and other complex transactions and investigations. The basis of acceptance therefore is trust and the ability to provide the best truth-finding tools available.