Thomson Is Coming, TR’s Own Legally-Trained LLM

Thomson Reuters (TR) is getting ready to launch ‘Thomson’ its own legally-trained LLM this summer, built using opensource models, their huge data store, and its many experts’ input. It will support CoCounsel and will boost both TR’s contract and legal research abilities. AL spoke to CTO Joel Hron about the move.

It all started in 2024, when TR bought UK-based Safe Sign, which had developed a small legal language model. But, as Hron explained, even back then TR knew what it was doing: setting in motion the development of its own generative AI LLM, based upon opensource systems, such as those of Meta or Mistral, and tapping its vast stores of legal data for pre-training and post-training.

The goal: to build their own LLM that would out-perform the general models.

The result: they’re very close now to it going live and have achieved much of what they set out to do.

Importantly, Hron notes, that Thomson, which could in theory be operated on premises inside a major law firm or large inhouse legal team’s office – has shown improvements compared to general models on legal tasks.

While the general aspects of the LLM are needed because…well, it’s a language model….the additional legal input, and at the massive scale that TR can provide across case law and contracts, plus some additional input on tax matters and other areas that TR works in such as the Reuters news world, have helped produce very promising results.

Hron says that using their internal benchmarks, along with public ones, they already see four of ten key areas doing better on legal tasks than general models, and are looking at achieving this with the other six.

He adds that many users of CoCounsel, for example while doing a doc review, may not know Thomson is at work there when it’s formally launched, but its presence will be there. For those who want something on prem, they will naturally be aware of Thomson, but either way, Hron stresses that having their own TR legal LLM means better security and privacy overall.

As to why it’s better than general models, Hron notes that it’s best to think of Thomson as a huge letter T, with broad arms representing general language understanding, math and coding skills, and then the column of the T as the very deep legal training data and expert input.

And on the opensource models, Hron underlines that they are just a set of ‘model weights –  when you train a transformer model, it’s just a matrix of numbers. And then it’s about tweaking those numbers [with legal training data and experts]’.

That means that Thomson is ‘portable’ and can switch which opensource model it’s working with. I.e. they’re not stuck with any particular foundational system. Meanwhile, Thomson’s weights get better as more data and more training comes into the LLM.

In terms of the hardware, they have deals with longstanding service providers, which have the GPU clusters they need to do this. And again, Hron notes the privacy benefits of this.

‘Thomson is our model, using our data, our network, and we can serve this model out,’ he says. ‘You could deploy on premises and lots of companies are really focused on that.’

So, where is this all going?

AL and Hron explore the point that Thomson will only get better, to which he then adds: ‘Thomson Reuters’ job is expert knowledge delivery, and this AI model is a new way of doing this, it’s a profound way of doing this, and it’s a durable way to embed our knowledge.’

Is this a hedge against OpenAI or Anthropic launching a legal vertical capability? Hron says he doesn’t believe they want to do that, and they’ll keep working with them.

This is not meant to be a defence, as such, but rather to build something additive that leverages the one thing that TR really has tons of: legal data. Plus, add in tax advisory data and also information from the huge Reuters news agency, and you have a compelling offering that will work in the background with the tools that the company already offers.

Conclusion

Some have argued that legal LLMs can never compete vs a general model. But, if you have a good general model, and have legal data on the terabyte scale that TR has, then maybe you can. Hron and TR believe that they have achieved a better performance on legal tasks than general models across a range of benchmarks.

And although TR is not saying this is a moat against Anthropic going into legal, it certainly helps them nevertheless. Also, it gives them – if it turns out this summer that performance really is notably better than before on areas such as contract review – a potential advantage against the multiple legal AI challengers in the market that work very much in the transactional space. On the research side, users may simply just note better responses, as many lawyers will be using TR, LexisNexis and Clio (now with vLex) for a lot of their legal research needs regardless.

Therefore, it’s perhaps on the transactional / contracts side where Thomson will give TR the greatest boost.

Plus, Thomson will only get better. There will be more legal data flowing into TR. There will be more training. There will be more expert input. And they can keep moving to new opensource general models as things evolve. Maybe the naysayers may have to think again about legal LLMs?

And, if the above was interesting, wait until you see these events: 

A Legal Tech Conference For All of Europe

Legal Innovators Europe – Paris – June 24 and 25.

There will be more news about the conference and key speakers as we get closer to June.

Look forward to seeing you there!

Richard Tromans, Founder, Artificial Lawyer and Legal Innovators conference Chair.

Note: the conferences are organised by Cosmonauts – please contact them with any queries. 

If you would like to be a speaker at Legal Innovators Europe, especially if you are at a law firm or inhouse legal team in Europe – whether based in France, Belgium, Spain or Germany, or beyond…..then please contact Phoebe at Cosmonauts:  phoebe@cosmonauts.biz

Note: if you are a legal tech company, please contact Robins: robins@cosmonauts.biz or Anjana anjana@cosmonauts.biz

And if you’re in the US and looking for the next major event to join after Legal Week, then see you in California this June!

Legal Innovators California, the landmark West Coast legal tech event, will take place on June 10 and 11, in the heart of the Bay Area, the home to many of the world’s leading AI businesses – and plenty of legal tech pioneers as well! More information and tickets here.


Discover more from Artificial Lawyer

Subscribe to get the latest posts sent to your email.