The Open Data Institute (ODI), a non-profit think tank co-founded by Sir Tim Berners-Lee, has launched an ‘AI Transparency Index’ and made five key recommendations to ensure generative AI’s weaknesses and legal risks are addressed.
In a new white paper, ‘Building a better future with data and AI’, they state that the potential for emerging AI technologies to transform industries such as diagnostics and personalised education ‘shows great promise’. Yet significant challenges remain.
Inadequate data governance can lead to ‘biases and unethical practices, undermining the trust and reliability of AI applications in critical areas such as healthcare, finance, and public services’, they added.
These risks are exacerbated by a lack of transparency that is ‘hampering efforts….to ensure compliance with legal standards’. Hence the launch of the AI Transparency Index to track the sector.
They added that ‘data [which is used for AI training] must meet agreed standards, which require a data assurance and quality assessment infrastructure. AI training datasets typically lack robust governance measures throughout the AI life cycle, posing safety, security, trust, and ethical challenges related to data protection and fair labour practices’.
(And in this case, when they say ‘fair labour practices’, they perhaps mean approaches by AI companies that don’t in effect steal the articles, books, scientific papers, music and other ‘content’ from authors and creators, then use that data to train their models, then sell what they’ve stolen back to the market via prompt-based regurgitation. Which of course, then leads to job losses where we should really not be trying to take the human out of the equation and leads to the poisoning of one’s own cultural landscape….so, there’s that.)
To help address this and several other problems, the ODI recommends the UK – and any other nation – focuses on the following five key things:
1. Ensure broad access to high-quality, well-governed public and private sector data to foster a diverse, competitive AI market;
2. Enforce data protection and labour rights in the data supply chain;
3. Empower people to have more of a say in the sharing and use of data for AI;
4. Update our intellectual property regime to ensure AI models are trained in ways that prioritise trust and empowerment of stakeholders;
5. Increase transparency around the data used to train high-risk AI models.
Commenting on the recommendations, Sir Nigel Shadbolt, Executive Chair & Co-founder of the ODI, said: ‘Government[s] must look beyond the hype and attend to the fundamentals of a robust data ecosystem built on sound governance and ethical foundations.
‘We must build a trustworthy data infrastructure for AI because the feedstock of high-quality AI is high-quality data. The UK has the opportunity to build better data governance systems for AI that ensure we are best placed to take advantage of technological innovations and create economic and social value whilst guarding against potential risks.’
Other insights from the ODI’s research, which it frames in a UK context, include:
• The public needs safeguarding against the risk of personal data being used illegally to train AI models. Steps must be taken to address the ongoing risks of generative AI models inadvertently leaking personal data through clever prompting by users. Solid and other privacy-enhancing technologies have great potential to help protect people’s rights and privacy as AIs become more prevalent
• Key transparency information about data sources, copyright, and inclusion of personal information and more is rarely included by systems flagged within the Partnership for AI’s AI Incidents Database.
• Intellectual property law must be urgently updated to protect the UK’s creative industries from unethical AI model training practices.
• Legislation safeguarding labour rights will be vital to the UK’s AI Safety agenda.
• The rising price of high-quality AI training data excludes potential innovators like small businesses and academia.
—
Overall, these are all sensible and important goals. The main challenge is that the genAI horse has already bolted from the stables and is charging down the road at full speed. Moreover, many thousands of people around the world are already using genAI systems that are based on, er…how can we put this politely…..’borrowed’ data – and of course some well-known companies….er….mentioning no names…., appear to be more than happy with this scenario and are nonchalantly dealing with the fall-out of their actions after having already transgressed on a global scale.
But, perhaps countries can draw a line in the sand and say ‘no more’? The EU is certainly trying.
One last point that the ODI makes which is certainly relevant to the legal sector – beyond all the legal battles over ‘borrowed’ IP – is the one about standards and transparency. The ODI doesn’t really go into accuracy and dependability of outputs, but clearly if we are going to have an AI Transparency Index for the stuff that goes into making LLMs, then we may also need something similar to measure what comes out of the other end of the genAI ‘sausage factory’. I.e. standards and benchmarks for AI accuracy.
Any road, it all boils down to this: the horse has bolted, but should we try and chase after it? We either give up and let Sam Altman and pals do whatever they want from now on and live with the consequences, or we at least – as the EU is trying to – make an effort to restore some boundaries and set some standards.
This site would strongly favour chasing after the horse.