Suffolk Law School Uses Reddit to Create Legal Question A2J Taxonomy

A collaboration of Suffolk Law School’s Legal Innovation and Technology Lab in the US and Stanford Law School’s Legal Design Lab with funding from The Pew Charitable Trusts is taking legal questions from consumers posted on social media site Reddit, and using them to create a taxonomy of legal issues to help train A2J tech applications.

Sounds unusual? At first it does, but when you look deeper it all makes sense. David Colarusso,  Director of Suffolk University Law School’s Legal Innovation and Technology Lab, explained to Artificial Lawyer what this is all about.

Reddit is a site where hundreds of people post their legal questions – among other things – asking the community for advice and direction, or just to validate that their issue is really a legal matter. They are taking all these questions – minus any identifying information – and putting them into groups, e.g. family – divorce, or property – eviction, and so on.

The beauty of this is that many A2J issues never go to court as most people don’t access the justice system because lawyers are too expensive (heard that before, yep?)….so, there is no record of their needs or their issues. So, you cannot build a really solid A2J application as you don’t know what people really want. But, Reddit knows…..

Then they go digging deeper and getting to more granular aspects of the questions, really sorting them into key areas and issues. This will also help A2J experts understand what is out there in terms of the legal issues people face.

We can all guess that things like divorce may be a hot topic for online legal sites, but there are dozens of other subjects that don’t always seem to have a clear label, for example a range of legal issues that stem from the use of social media itself, from ‘sextortion’ to social media ID theft.

OK, so that’s the training data. But what are they using it all for? Colarusso said: ‘We are looking at how we can use machine learning tools and how people use online legal resources. But, there is a lack of a taxonomy to train these systems.’

I.e. how can you build a comprehensive A2J corpus of useful answers if we don’t know what people really are asking about. Just pumping out bland legal information won’t do much good.

Also, if you are making an A2J legal bot, such as Law Droid makes, then you need lots of training material to decide how best your bot should answer.

And, simply put, doing solid market research about what the clients – in this case the general public – want, makes perfect sense.

They will then use a gamification platform to encourage people to help with the sorting of the 10,000s of legal queries, helping to classify them so that they can be placed in the taxonomy.

‘We’ve put together a game. You go and look at questions and it asks do you see that issue and you earn points by labelling things,’ he explained.

However, he points out at this stage they are ‘not looking at the answers – but the questions to do the labelling’ – so that others can come along and provide the answers later. But knowing what the right questions are, and where people do really need help matters.


(Main photo credit: David Colarusso – and what an amazing pic.)