The Open Data Institute (ODI), co-founded by the inventor of the web Sir Tim Berners-Lee, is to launch two of the first ever government-backed Data Trusts in the world, with the purpose of training AI systems underpinned by a specific legal structure.
A Data Trust is a legally defined collection of data, often derived from public or government-connected sources, and then used to train up machine learning AI systems in order to produce useful applications (see formal definition below).
The idea has been floated for some years as the best means of building really useful AI applications that can operate at city-level or national-level, that have been derived from very large, real-world data sets, e.g. how millions of people might use a certain public service.
The problem has been that, understandably, public bodies and others have been uncertain about how to do this, or even if they should do this because of data privacy issues.
Data Trusts, which are legally constructed entities, are seen as the answer and help form a regulated bridge between the collected data and the AI companies (or other tech companies such as smart contract developers), while retaining public trust.
One of the planned Data Trust projects will involve the Mayor of London and the Royal Borough of Greenwich. London’s City Hall will be working with the ODI as part of its Smarter London Together Roadmap to support AI and protect ‘privacy by design’ for Londoners.
This Greenwich project will focus on ‘real time data from IoT devices and sensors, and will investigate how this data could be shared with innovators in the technology sector to create solutions to city challenges‘.
Some of data collected will cover: energy use, parking space occupancy and the impact of different weather patterns.
The UK’s Secretary of State for Digital, Culture, Media and Sport Jeremy Wright QC yesterday announced that the ODI will now lead two pilot projects in conjunction with the Government’s Office for Artificial Intelligence.
It wan’t totally clear what the second of the pilot Data Trusts will be, but the Government said that it may cover areas such as: data about cities, the environment, biodiversity, and transport.
Digital Secretary Jeremy Wright said: ‘We are a world-leader in artificial intelligence and our modern Industrial Strategy puts pioneering technologies at the heart of our plans to build a Britain which is fit for the future. But it is crucial that the public have confidence it is being used to improve people’s lives and we have the right expertise and framework in place to maximise its potential.’
‘I am pleased we have secured global leaders from academia and industry to work alongside us as we develop the world’s first Centre for Data Ethics and Innovation and explore the potential of data trusts,’ he added.
These appear to be very early steps, but many AI and scientific research companies are very interested in the potential to tap huge amounts of data at a national or citywide scale. One example that is often given is pouring the NHS’s data into a Data Trust (or series of Data Trusts) to help develop a wide range of AI-based applications that could help improve efficiency.
It will be very interesting to see how this develops, not the least for law firms that may come to develop this as an area of expertise and then advise on this area at a global level.
–
Definition of a Data Trust by the ODI
A data trust takes the concept of a legal trust and applies it to data.
Historically, trusts have been used in law to hold and make decisions about assets such as property or investments. A data trust takes this concept of holding something and making decisions about its use, but applies it to data. It is a legal structure that provides independent stewardship of some data for the benefit of a group of organisations or people.
That benefit might be to create new businesses, help research a medical disease, or empower a community of workers, consumers or citizens.
In a data trust, the trustors may include individuals and organisations that hold data. The trustors grant some of the rights they have to control the data to a set of trustees, who then make decisions about the data – such as who has access to it and for what purposes.
The beneficiaries of the data trust include those who are provided with access to the data (such as researchers and developers) and the people who benefit from what they create from the data.
The trustees take on a legally binding duty to make decisions about the data in the best interests of the beneficiaries. This is sometimes referred to as a fiduciary duty. Proponents of data trusts suggest this duty would help to increase the trust that individuals and organisations have in the way data is used.
Supporting good uses of data trusts and protecting against bad ones
A data trust forms part of our data infrastructure, so we would expect it to align with our principles for good data infrastructure. These principles are intended to maximise value by helping to create an open and trustworthy data ecosystem. They include measures such as building with the web and designing in ways that can adapt to changing situations.
In addition to these principles, a data trust needs some specific characteristics. These characteristics are intended to help support good uses of data trusts, and protect against bad ones. We believe a data trust must have:
- a clear purpose
- a legal structure (including trustors, trustees with fiduciary duties and beneficiaries)
- (some) rights and duties over stewarded data
- a defined decision making process
- a description of how benefits are shared
- sustainable funding
When we have discussed these with some people they have also asked about other characteristics which we do not think are useful or necessary. For example, we think a data trust can use any form of technology for data storage and access; that data trusts can steward data which is either shared or open; and that data trusts could include data from the public, private or third sectors.
Piloting a data trust
Words, descriptions and motives are fine, but we now need to find out if this model works in the real world and if it meets our goal of increasing access to data while retaining trust. To do this, we need to test it by running some pilots. This is what we will be doing in the coming months.
In the pilots, we will explore a range of research questions such as “how do the roles of data controllers and data processors from data protection legislation map to the roles in a data trust?”, “what is the cost of running a data trust?” and “how can a data trust be stopped?”. If we determine the model is useful, we would also want to learn how to make it repeatable and scalable by as many people as possible.
We are not the only people exploring these questions. Sean McDonald and Bianca Wylie, for example, are currently experimenting with data trusts and different governance models for data. We have built on the work they and others have shared.
As always we will work as openly as possible and also share the lessons we learn. If we find our work is useful then we will also provide more detail on the characteristics and how to assess data trusts against them.
Do get in touch at policy@theodi.org if you want to know more, work with us, or to share your own stories about exploring the same topics.
This is a fantastic idea, and solves a lot of challenges for many people. All we need to see now is Ai companies willing to integrate with such data trusts and given the portability of data, I presume law firms will be more willing to adopt their tech.