By Sergii Shcherbak, lawyer and Head of Software Development at Synch
Data processing agreements (DPAs) take plenty of time to review, as demonstrated by legal departments at large European companies where the DPA throughput is measured in hundreds a month.
DPAs are lengthy and complex contracts, concluded between a company-data controller and a contractor-data processor, which regulate how personal data is processed by the parties and formalize their roles and obligations within the scope of that processing.
The requirements of the General Data Protection Regulation (GDPR) for processing of personal data are detailed and comprehensive, and when it comes to DPAs, additional requirements apply to the content of such agreements. From our experience, review of a single DPA by an in-house legal department may take up to 20 hours.
There is a clear need for a solution that would assist lawyers in DPA review and optimize the whole process. We at Synch, the digital law firm behind legal AI tools such as PrivacyPolicyCheck.Ai, decided to embark on the challenge of building such a product by leveraging our software development and legal expertise.
It was important for us to get input from our customers, who expressed the need to optimize their DPA review processes, on how they see a solution that would solve this task for them. Taking that feedback into account, we aimed to build a tool that:
- conducts a comprehensive GDPR compliance review of a DPA;
- uses state-of-the-art deep learning architecture;
- is accurate (more than 90%);
- is fast: it should take a few seconds per review;
- can handle high throughput of documents: the solution will be used at large companies with vast amounts of DPAs to review;
- is easy to use: it should take 2-3 mouse clicks to receive a compliance report;
- has several interfaces depending on the client’s preferences: some clients prefer a web interface, others feel more comfortable using a Microsoft Word add-in, and for some customers the simplest way to use a service like this is to send an email with a DPA attachment and receive results back;
- has an API for client software integrations;
- can be used on-premise (offline), in case the client’s security standards do not allow for a SaaS solution; and
- has explainable AI (not a ‘black box’).
The product of our development efforts is DPA AI, which meets all the above criteria. DPA AI makes use of state-of-the-art deep learning technology to review DPAs for compliance with the GDPR.
The goal of DPA AI is not to replace review by a lawyer completely, but to augment it and save time in the process. By using DPA AI, our customers aim to save up to 75% of time spent on manual contract review.
Analysis of a single DPA takes a few seconds, be it a five-page document or a 50-page contract, a well-formatted DOCX file or a PDF with scanned pages. And because it is so fast, DPA AI can handle high throughput of incoming DPAs with no service congestion.
When review of the submitted document is complete, users are presented with a detailed compliance report which sets out what the DPA got right, what issues and risks are, and how to fix those. The tool is available as a web application, Microsoft Word add-in, via email, API, and as on-premise software package. This variety of interfaces ensures that clients have more choices for integration of DPA AI into their workflows.
DPA AI was built with AI transparency in mind: users can see why the neural network arrived at a certain conclusion by observing the extracted clauses and having access to a comprehensive document with all the extracted information and associated compliance logic, in addition to the standard compliance report.
There are three main steps taken by DPA AI after it has accepted a DPA for evaluation:
- clause extraction: tag all relevant clauses and group them depending on the GDPR requirement;
- named entity recognition (NER): analyze the extracted clauses and tag wordings representing specific GDPR concepts (entities), such as type of personal data (e.g. ’email’, ‘username’ etc.), ‘processing period’ (e.g. ‘within 1 year from the date of termination’, and so on), etc.
- compliance evaluation: based on the extracted information, ‘connect the dots’ and provide the user with a detailed GDPR compliance report which sets out what in DPA is correct and why, what is wrong and why, whether there are any risks and some clauses require a separate review, and how to fix the identified issues.
Depending on the step, a separate set of neural networks, with a task-specific architecture, is used. All the neural networks were custom-built and trained on tens of thousands of DPA clauses provided by our customers and reviewed and labeled by our lawyers.
As regards precision, our model reaches more than more than 95% accuracy on a test dataset which is a 10% slice of the complete dataset with tens of thousands of labeled clauses and entities.
The DPA AI neural networks have been subject to thorough testing. For example, to benchmark our clause extraction neural network, we separately developed three new models based on the latest NLP neural network architecture – Google’s BERT (Bidirectional Encoder Representations from Transformers), which is now used in Google Search.
BERT has taken the NLP world by storm in 2018 and since then has been subject to a lot of experimentation and development, which resulted in such architectures as ALBERT (by same Google), RoBERTa (by Facebook AI), and DistilBERT (by Hugging Face), all three developed and open-sourced in summer/autumn 2019.
The first two architectures are in top 10 best NLP models in the world. Having benchmarked our clause extraction model against new BERT-based neural networks that we trained on the same labeled data of ours, we saw almost identical results, except that our model is 4 times faster and much smaller (‘lighter’).
DPA AI currently accepts DPAs in English. But the technology can be adapted to a different language, jurisdiction, or document type, if enough data is available for training.
DPA AI has been developed in close collaboration with such large corporate clients as Husqvarna, Dustin, and Bankgirot, which provided feedback and data for labelling and are currently using DPA AI internally. All elements of the DPA AI functionality are the result of joint efforts of Synch’s software development and legal departments.
(This is an educational guest post intended to help readers gain a better understanding of how this new product works.)