DeepJudge, a legal AI startup founded by AI researchers and former Google search engineers, and focused on leveraging law firm and inhouse internal knowledge, has raised $10.7m in an oversubscribed seed funding round. Additionally, Steve Obenski, former senior leader at Kira Systems, which was acquired by Litera, has joined as an interim Chief Strategy Officer to help spearhead its US expansion.
The Swiss company, founded by Paulina Grnarova, Yannic Kilcher, and Kevin Roth (pictured with added money sparkles), seeks to efficiently leverage the ‘troves of internal documents and data held by lawyers…this includes via document management systems, the Microsoft 365 ecosystem, and other tools’, they said.
Now, this would not be the first company to provide something that can do this type of task. The DMS pioneers such as iManage and NetDocuments have built out plenty of capabilities to support improved KM search. Meanwhile, Henchman (now part of LexisNexis) has pioneered the use of semantic clustering to help lawyers find related clauses in a DMS. Zuva – created by Noah Waisberg of Kira – also does plenty of work leveraging AI to help inhouse teams discover key information in contracts. And then there is Litera (which now owns Kira), which has several approaches to tapping KM data.
However the company said that its ‘DeepJudge Knowledge Assistant, the only generative AI interface for law firms and legal departments that can instantly access the firm’s entire document knowledge base….uses RAG technology to provide the most relevant, up-to-date information, ensuring high-quality answers are grounded in each organization’s comprehensive institutional knowledge’.
In February, the company announced that Tony Ensinger, previously Head of Sales at Casetext (acquired by Thomson Reuters), joined as their SVP of Sales and Product Strategy. The company’s advisors include Jan Puzicha, one of the founders of Recommind (which was acquired by OpenText).
Artificial Lawyer put a few questions to CEO, Grnarova.
– How does it use RAG – i.e. what various methodologies does your RAG approach use?
Our approach is unique in that we are hyper-focused on quality retrieval – the “R” in RAG. If the generative interface isn’t provided with relevant results it’s obviously not going to have any hope of correctly answering the user’s question. The way we improve retrieval is with our proprietary search technology, which is able to index a firm’s entire DMS, as well as other systems such as Sharepoint and the HighQ site.
– How accurate is it and how do you measure this?
We have a diverse set of internal evaluation metrics, but our point of view is that any one measure of accuracy is not that informative. Instead, we would like to focus on the usefulness of the tool to the end user. A major component in our product centres around keeping the human in the loop and providing users with visibility into every step of the RAG process, explaining what search results went into a particular generated output, how relevant those results are, and which parts of the results influenced which parts of the output. This visibility provides the necessary trust that is absent in black-box RAG systems.
– If a law firm does not allow you to use their DMS, can this tool still be used?
DeepJudge doesn’t require a DMS connection to be useful. We have customers that are using it only for other repositories such as client portals or Microsoft repositories. Though, in the vast majority of cases, the group of repositories that we are indexing includes the DMS.
So, there you go.
The round was led by Coatue, with participation from several angel investors.
P.S. Got to say that just focusing on ‘usefulness to the user’ over any publicly shared measure of accuracy avoids providing details that could be beneficial. Different lawyers will have very different experiences and expectations when using a genAI tool. Some may be very happy to get the most basic responses. Others may want things that are not even possible. Potential buyers will also not have any yardstick to measure the product’s accuracy beforehand. Perhaps users would actually pick another tool if they could compare like with like beforehand? Surely some kind of measure or benchmark would help lawyers and innovation teams to gain at least an element of objective judgment? That’s not to say that the ability to deliver ‘answer usefulness’ is not valid – it is. But, it will always be a very subjective experience felt on a lawyer-by-lawyer basis and thus hard to turn into a meaningful, shareable measure.