India Joins Legal-Specific LLM Movement With Lexlegis.AI

Lexlegis.AI, a Mumbai-based legal research company, has launched what appears to be a legal-specific LLM. It is trained on over 10 million Indian legal documents. The genAI model can be used for case search with citations, and soon for document insights and drafting.

The company can trace its roots back 25 years to when it started to compile Indian court case law, often related to tax disputes – and has since grown into a large-scale legal data business. It has not given ‘behind the scenes’ details of how they built the LLM and it’s not clear what architecture the model is based on, or if it works in unison with other LLMs to get its results, but even so, the training focus has been on legal texts, and related to one jurisdiction: India.

They state that the LLM, which has now formally launched, is trained on over 20 billion tokens, which comparatively would make it based on quite a small training set. For example, Equall’s two latest Legal-specific LLMs, or L-LLMs, are based on 54 and 141 billion tokens. Meanwhile, GPT-4 was trained on 13 trillion (!) tokens. So, perhaps we can call this a Legal-specific Small Language Model, or L-SLM?

Interestingly, they add that they want to go international with the product next year. Moreover, lawyers with clients operating in India should already be able to get some value from this genAI system in combination with the company’s case law library.

The company also highlighted that using genAI could help ‘India’s judicial system [which is] currently grappling with over 44.9 million pending cases’. And faster research and drafting will certainly help, although such massive systemic bottlenecks will need more than AI to solve the problem.

They also add that: ‘The AI’s capabilities extend beyond research; it can scan and analyze legal documents, identify relevant information, and flag inconsistencies, drastically reducing the time spent on manual reviews.’

And it is designed for ‘law firms, government departments, corporate legal teams, SMEs, and independent legal professionals’.

Saakar Yadav, Founder and CEO, said in a statement: ‘Lexlegis.ai is designed to empower judiciary and legal professionals with AI-driven tools that simplify and sharpen research. Our forthcoming features, Interact and Draft, will further streamline legal processes, enabling professionals to save time and improve accuracy. We intend to introduce more workflows and processes to practice management platform not only in India, but even in other countries in the near future.’

In the last couple of weeks this site has covered how Paris-based Equall has launched two new L-LLMs, which it says compete well against general models. While last week Thomson Reuters bought Safe Sign, a startup which builds L-LLMs. TR did this in order to provide additional performance to the way it already works via a combination of other general LLMs.

As explored in several online discussions last week, it’s clear that this is not really a case of a battle to the death of general vs specific LLMs, but rather a question of how best to combine multiple genAI models, of whatever type, to get the maximum benefit.

Of course, some may argue that L-LLMs are pointless, as soon enough companies such as OpenAI will produce incredibly powerful new general LLMs e.g. GPT-5, that will make the idea of specific training of models redundant. However, as the Stanford University study into legal genAI research tools has shown, even the best models on offer today, plus refinement and expertly-made system prompts, are not always reaching the accuracy levels many lawyers would like to see.

So, maybe it’s worth exploring what adding smaller models, and legal-specific models – and even perhaps much smaller, legal specific models – to the mix, alongside general LLMs can do…?

The counter-argument is that it’s best to just wait until GPT-5, or 6 or 7 arrives. But, what if we wait and accuracy is still not getting where we want it to be for legal work? Maybe then we will see that combining it with legal-specific models – perhaps very niche, specific ones for complex legal tasks – is the better option?

The challenge is that we can either 1) sit on our hands and wait for the holy mega-LLM to arrive that cures all genAI sins, or 2) get on with experimenting with a range of approaches to see what can be achieved right now. The choice, as ever, is down to us.