This week the National Health Service (NHS) in the UK updated its 10 guiding principles for the use of AI, algorithms and data. And when an organisation of such importance – which holds petabytes of patient information – sets benchmarks for AI, then we should pay attention.
As most readers around the world will know, the NHS is the UK’s publicly funded health service, created in 1948 and regarded by many as one of the greatest achievements in this country’s history.
It also employs around 1.5 million people in the UK, has pharmaceutical and service contracts in the tens of billions of pounds, and — check out this number — ‘The NHS deals with over 1 million patients every 36 hours’.
Yep, 1 million people every 36 hours.
So what? What’s this got to do with AI, data and algos? In short, it’s because the NHS is also a data giant, producing oodles of information ranging from patient care, to cancer drug success, to biosample analysis, to vendor contract issues, and a hundred other areas.
The challenge: how does such a critically important part of the nation handle systems that automate, for example, analysis of medical data? How does it work with AI companies? How does it share information? How does it protect itself from charges of negligence if automated tech isn’t perfect – which it never can be? How can it be sure they are not wasting money on tech that won’t deliver what is wanted?
So, they created a code of 10 key principles to help inform their culture and behaviour as an organisation. The first draft came out in 2018 and they have now just been updated after stakeholder feedback.
These ideas are perhaps going to be a model for many other sectors of the economy, including the law. In fact, any large organisation, from a law firm to an inhouse legal team, should have a close read of the thoughts below. Just replace the word ‘patient’ with ‘client’ and ‘clinician’ with ‘lawyer’.
Or as the NHS says: ‘The code is designed to recognise that, while data-driven health and care technologies will undoubtedly deliver huge benefits to patients, clinicians, carers, service users and the system as a whole.
‘If we do not think about issues such as transparency, accountability, liability, explicability, fairness, justice and bias, it is also possible that the increasing use of data-driven technologies, including AI, within the health and care system could cause unintended harm.’
So, here are the NHS principles on AI and data.
The 10 Principles
1. Understand users, their needs and the context
Understand who specifically the innovation or technology will be for, what problems it will solve for them and what benefits they can expect. Research the nature of their needs, how they are currently meeting those needs and what assets they already have to solve their own problems. Consider the clinical, practical and emotional factors that might affect uptake, adoption and ongoing use.
2. Define the outcome and how the technology will contribute to it
Understand how the innovation or technology will result in better provision and/or outcomes for people and the health and care system. Define a clear value proposition with a business case highlighting outputs, outcome, benefits and performance indicators.
3. Use data that is in line with appropriate guidelines for the purpose for which it is being used
State which good practice guideline or regulation has been adhered to in the appropriate use of data, such as the Data Protection Act 2018. Use the minimum personal data necessary to achieve the desired outcomes of the user’s needs and the context.
4. Be fair, transparent and accountable about what data is being used
Utilise data protection-by-design principles with data-sharing agreements, data flow maps and data protection impact assessments. Ensure all aspects of the Data Protection Act 2018 have been considered.
5. Make use of open standards
Utilise and build into the product or innovation current data and interoperability standards to ensure it can communicate easily with existing national systems. Programmatically build data quality evaluation into AI development so that harm does not occur if poor data quality creeps in.
6. Be transparent about the limitations of the data used and algorithms deployed
Understand the quality of the data and consider its limitations when assessing if it is appropriate for the users’ needs and the context. When building an algorithm, be clear about its strengths and limitations, and give clear evidence of whether the algorithm you have published is the algorithm that was used in training or in deployment.
7. Show what type of algorithm is being developed or deployed, the ethical examination of how the data is used, how its performance will be validated and how it will be integrated into health and care provision
Demonstrate the learning methodology of the algorithm being built. Aim to show in a clear and transparent way how outcomes are validated.
8. Generate evidence of effectiveness for the intended use and value for money
Generate clear evidence of the effectiveness and economic impact of a product or innovation. The type of evidence should be proportionate to the risk of the technology and its budget impact. An evidence-generation plan should be developed using the evidence standards framework published by NICE.
9. Make security integral to the design
Keep systems safe by safeguarding data and integrating appropriate levels of security into the design of devices, applications and systems, keeping in mind relevant standards and guidance.
10. Define the commercial strategy
Purchasing strategies should show consideration of commercial and technology aspects and contractual limitations. Consider only entering into commercial terms in which the benefits of the partnerships between technology companies and health and care providers are shared fairly.
This all looks very sensible. It’s focused on practical benefits, problem-solving and collecting economic evidence. Of course, will the NHS, or any other large and complex organisation, be able to live up to these goals? That remains to be seen.
P.S. Given how much effort the NHS has put into these, it’s worth sharing some of the key points in further detail, especially Principles 6 and 7. Check them out below:
Principle 6: Be transparent about the limitations of the data used
‘The data used must be well understood and reviewed for accuracy and completeness. Accuracy is the closeness of agreement between a data value and its true value. Completeness is the presence of the necessary data. NHS Digital publishes a quarterly data quality and maturity index, which provides data submitters with transparent information.
A 2-stage approach is suggested when applying analytics to any data. Algorithms should be trained to understand the levels of data quality first and then achieve their objective by using the variables given. This 2-stage approach should be built in so that high fluxes in data quality are handled appropriately.
Assessment of data quality should not be a one-off check – continuous anomaly detection should be in place to provide alerts to changes in a data source. NHS England and the UK Statistics Authority have produced guidance on data quality, which should be referred to. Be aware of potential biases in the data used for training algorithms – consider the representativeness of the database used for training the algorithm. If the data provided for the AI to learn is limited to certain demographic categories or disease areas, this could potentially limit the applicability of the AI in practice as its ability to accurately predict could be different in other ethnic groups.
Good data linkage will avoid reducing data quality
There is a range of approaches for linking data, which can provide differing levels of accuracy and data loss. It is often necessary to strike a balance between good matching accuracy and loss of too many records. Consideration should be given to the effects of a selected linkage procedure on data quality. In particular, if the process could cause systematic loss or mismatching of a particular type of record, this could have downstream implications in model assumptions and algorithm training.
Linking datasets may require those carrying out the linkage procedure to use identifiable data to match the data. It is therefore important to ensure that anyone with access to the identifiable data has a legal right of access. Similarly, the process of converting an identifiable dataset into an anonymised one, if conducted by a person, will need to be carried out by someone with a correct legal basis.
Where you can access data sets
There is a range of sources of health data:
- Public Health England collects a range of data, made available in different formats, for example their fingertips tool
- the Office for National Statistics collects a range of health-related microdata at their ONS virtual microdata lab
- UCI have built an open source training dataset for machine learning (Health Data Research UK are in the process of building further training datasets)
- Health Data Finder
- NHS Digital Data Access Request Service
Access must be requested for data that is not already in the public domain. The process for this varies depending on the organisation providing the data, and this should be detailed on the organisation’s website. NHS Digital holds the responsibility for standardising, collecting and publishing data and information from across the health and social care system in England.
Training vs deployment
Be clear on the strengths and limitations of the training versus deployment data set. If the algorithm has been built on a training set and not yet deployed in a real-world clinical implementation, transparency should be shown to that effect. Demonstrate whether the algorithm is published in a real-world deployed environment or a training environment.
Principle 7: Show what type of algorithm is being developed or deployed, the ethical examination of how the data is used, how its performance will be validated and how it will be integrated into health and care provision
Consider how the introduction of AI will change relationships in health and care provision, and the implications of these changes for responsibility and liability. Use current best practice on how to explain algorithms to those taking actions based on their outputs.
When building an algorithm, be it a stand-alone product or integrated within a system, show it clearly and be transparent of the learning methodology (if any) that the algorithm is using. Undertake ethical examination of data use specific to this use-case. Achieving transparency of algorithms that have a higher potential for harm or unintended decision-making, can ensure the rights of the data subject as set out in the Data Protection Act 2018 are met, to build trust in users and enable better adoption and uptake.
Work collaboratively with partners, specify the context for the algorithm, specify potential alternative contexts and be transparent on whether the model is based on active, supervised or unsupervised learning. Show in a clear and transparent specification:
- the functionality of the algorithm
- the strengths and limitations of the algorithm (as far as they are known)
- its learning methodology
- whether it is ready for deployment or still in training
- how the decision has been made on the acceptable use of the algorithm in the context it is being used (for example, is there a committee, evidence or equivalent that has contributed to this decision?)
- the potential resource implications
This specification and transparency in development will build trust in incorporating machine-led decision-making into clinical care.’
(Note: all NHS text quoted above comes directly from their own site, the main page can be found here.)