23.1 C
New Delhi
Saturday, November 23, 2024

Labeling AI Systems: Is It Necessary Like Prescription Drug Labels?

More from Author

In Short:

AI systems are being used in critical health care, but they can make mistakes, leading to serious patient risks. Experts from MIT and Boston University suggest adding responsible-use labels for these AI models, similar to those on medications. These labels would provide information about the model’s training data, potential biases, and recommended usage, ensuring safer deployment in health care settings.


Artificial Intelligence (AI) systems are increasingly being implemented in safety-critical healthcare environments. However, these models occasionally generate inaccurate information, exhibit biased predictions, or fail unexpectedly, which can have severe implications for both patients and clinicians.

In a commentary article published today in Nature Computational Science, MIT Associate Professor Marzyeh Ghassemi and Boston University Associate Professor Elaine Nsoesie advocate for the introduction of responsible-use labels for AI systems. These labels would function similarly to the labels mandated by the U.S. Food and Drug Administration for prescription medications, thereby aiming to minimize potential risks associated with AI in healthcare.

Need for Responsible Use Labels

Q: Why do we need responsible use labels for AI systems in healthcare settings?

A: In healthcare environments, doctors often depend on technologies or treatments that they may not fully comprehend. This lack of understanding can stem from fundamental complexities—such as the mechanisms of acetaminophen—or from the limits of specialization; for instance, we do not expect clinicians to maintain or repair an MRI machine. To address this, we have established certification systems through agencies like the FDA that ensure medical devices and drugs are appropriate for specific applications.

Moreover, medical devices typically include service contracts; for example, a technician from the manufacturer will repair a miscalibrated MRI machine. For approved drugs, systems of post-market surveillance exist to address adverse effects or events—identifying complications that might arise when a significant number of patients using a drug exhibit similar health issues.

In contrast, models and algorithms, whether they utilize AI or not, circumvent many of these approval and long-term monitoring frameworks, which is concerning. Prior studies have indicated that predictive models require thorough evaluation and ongoing monitoring, particularly with the capabilities of contemporary generative AI technology. The absence of comparable scrutiny to monitor model predictions complicates the detection of problematic outputs. Generative models currently employed in hospitals could also potentially be biased. Responsible-use labels could serve as a safeguard against the perpetuation of biases that originate from human practitioners or flawed clinical decision-support processes from the past.

Core Information for Responsible Use Labels

Q: Your article outlines several elements of a responsible-use label for AI, paralleling the FDA’s approach in creating prescription labels, including approved usage, ingredients, and potential side effects. What essential information should these labels include?

A: Labels should clearly articulate the time, place, and conditions of a model’s intended use. For example, users should be informed about the temporal context in which models were trained and whether data reflects conditions during significant events like the Covid-19 pandemic. This historical context could influence health practices and impact model outputs. Therefore, we advocate for transparency regarding ‘model ingredients’ and any ‘completed studies’.

Geographical considerations also matter; previous research has shown that models trained in one location often underperform when applied elsewhere. Understanding the origin of training data and the population context can inform users about possible ‘side effects’, as well as necessary ‘warnings and precautions’. For models designed to predict specific outcomes, knowing when and where they were trained helps ensure prudent deployment. However, many generative models possess considerable versatility and can serve multiple purposes. In such cases, clearer guidance on ‘approved usage’ versus ‘unapproved usage’ becomes paramount. For instance, if a developer has assessed a generative model for decoding clinical notes and creating billing codes but discovered a bias towards overbilling specific conditions, that information must be disclosed. Misusing this flexible model for critical decisions, like specialist referrals, could lead to adverse outcomes.

In general, while advocating for the most robust model training possible with available resources, we emphasize the importance of transparency. No model is infallible; society now recognizes that no medication is without risk. The same perspective should apply to AI models. Any model—whether AI-driven or not—has limitations and should be treated with appropriate caution.

Implementation of AI Labels

Q: If AI labels were to be implemented, who would be responsible for labeling, and how would these labels be regulated and enforced?

A: For models not intended for practical application, disclosures equivalent to those for high-quality research publications would suffice. However, once a model is to be deployed in a human-facing environment, both developers and implementers should conduct preliminary labeling based on established frameworks. It is crucial that these claims undergo validation before deployment; in critical contexts like healthcare, agencies from the Department of Health and Human Services may oversee this process.

For model developers, knowing that labeling will be necessary can promote more rigorous consideration of the development process itself. For example, being aware that one must disclose the demographic characteristics of the data on which a model was trained, such as if it predominantly featured dialogue from male chatbot users, directs attention to potential deployment challenges.

Considerations such as the demographics of the dataset, the duration of data collection, sample sizes, and the rationale for including or excluding specific data are integral for anticipating potential issues during deployment.

- Advertisement -spot_img

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

- Advertisement -spot_img

Latest article