3 principles for regulatory-grade large language model application – CIO

1. The No-BS Principle

Under the No-BS Principle, it is unacceptable for LLMs to hallucinate or produce results without explaining their reasoning. This can be dangerous in any industry, but it is particularly critical in regulated sectors such as healthcare, where different professionals have varying tolerance levels for what they consider valid.

For example, a good result in a single clinical trial may be enough to consider an experimental treatment or follow-on trial but not enough to change the standard of care for all patients with a specific disease. In order to prevent misunderstandings and ensure the safety of all parties involved, LLMs should provide results backed by valid data and cite their sources. This allows human users to verify the information and make informed decisions.

Moreover, LLMs should strive for transparency in their methodologies, showcasing how they arrived at a given conclusion. For instance, when generating a diagnosis, an LLM should provide not only the most probable disease but also the symptoms and findings that led to that conclusion. This level of explainability will help build trust between users and the artificial intelligence (AI) system, ultimately leading to better outcomes.

2. The No-Sharing Principle

Under the No Data Sharing Principle, it is crucial that organizations are not required to share sensitive data—whether their proprietary information or personal details—to use these advanced technologies. Companies should be able to run the software within their own firewalls, under their full set of security and privacy controls, and in compliance with country-specific data residency laws, without ever sending any data outside their networks.

This does not mean that organizations must give up the advantages of cloud computing. On the contrary, the software can still be deployed with one click on any public or private cloud, managed, and scaled accordingly. However, the deployment can be done within an organization’s own virtual private cloud (VPC), ensuring that no data ever leaves their network. In essence, users should be able to enjoy the benefits of LLMs without compromising their data or intellectual property.

To illustrate this principle in action, consider a pharmaceutical company using an LLM to analyze proprietary data on a new drug candidate. The company must ensure that their sensitive information remains confidential and protected from potential competitors. By deploying the LLM within their own VPC, the company can benefit from the AI’s insights without risking the exposure of their valuable data.

3. The No Test Gaps Principle

Under the No Test Gaps Principle, it is unacceptable that LLMs are not tested holistically with a reproducible test suite before deployment. All dimensions that impact performance must be tested: accuracy, fairness, robustness, toxicity, representation, bias, veracity, freshness, efficiency, and others. In short, providers must demonstrate that their models are safe and effective.

To achieve this, the tests themselves should be public, human-readable, executable using open-source software, and independently verifiable. Although metrics may not always be perfect, they must be transparent and available across a comprehensive risk management framework. A provider should be able to show a customer or a regulator the test suite that was used to validate each version of the model.

A practical example of the No Test Gaps Principle in action can be found in the development of an LLM for diagnosing medical conditions based on patient symptoms. Providers must ensure that the model is tested extensively for accuracy, taking into account various demographic factors, potential biases, and the prevalence of rare diseases. Additionally, the model should be evaluated for robustness, ensuring that it remains effective even when faced with incomplete or noisy data. Lastly, the model should be tested for fairness, ensuring that it does not discriminate against any particular group or population.

By making these tests public and verifiable, customers and regulators can have confidence in the safety and efficacy of the LLM, while also holding providers accountable for the performance of their models.

In summary, when integrating large language models into regulated industries, we must adhere to three key principles: no-bs, no data sharing, and no test gaps. By upholding these principles, we can create a world where LLMs are explainable, private, and responsible, ultimately ensuring that they are used safely and effectively in critical sectors like healthcare and life sciences.

As we move forward in the age of AI, the road ahead is filled with exciting opportunities, as well as challenges that must be addressed. By maintaining a steadfast commitment to the principles of explainability, privacy, and responsibility, we can ensure that the integration of LLMs into regulated industries is both beneficial and safe. This will allow us to harness the power of AI for the greater good, while also protecting the interests of individuals and organizations alike.



Source link