Study Reveals AI Might Cause Inconsistent Results in Home Surveillance

In Short:

A study by MIT and Penn State shows that large language models (LLMs) used in home surveillance might wrongly suggest calling police even when no crimes are occurring. The models are inconsistent, often flagging similar videos differently, and seem biased against neighborhoods with more white residents. Researchers stress the need for careful deployment of AI to avoid harmful consequences.

A recent study conducted by researchers from MIT and Penn State University has revealed concerning implications regarding the use of large language models (LLMs) in home surveillance systems. The findings indicate that these models might recommend contacting law enforcement even in instances where surveillance footage does not depict any criminal behavior.

Inconsistencies in Flagging Videos

The study illustrates that the models analyzed exhibited inconsistencies in their decisions about which videos warranted police intervention. For example, one model may flag a video depicting a vehicle break-in, while failing to flag another video showing a similar incident. Additionally, the models often disagreed on whether to alert the police regarding a specific video.

Demographic Influences and Biases

The researchers identified a troubling trend where certain models were less likely to flag videos for police intervention in predominantly white neighborhoods, even after controlling for other factors. This suggests that the models possess inherent biases related to the demographics of the area, highlighting the serious ethical implications of relying on such technology.

Norm Inconsistency and Predictability Issues

These results point to a broader phenomenon termed “norm inconsistency,” whereby the models apply social norms inconsistently across similar surveillance scenarios. This unpredictability poses challenges in understanding how these models might function in various contexts.

Expert Insights

Co-senior author Ashia Wilson, a professor at MIT, comments on the urgent need for careful consideration when deploying generative AI models in sensitive environments, stating, “The move-fast, break-things modus operandi deserves much more thought since it could be quite harmful.”

Potential Applications in High-Stakes Settings

Although LLMs are not currently employed in real surveillance applications, their usage in critical areas such as healthcare, mortgage lending, and hiring raises significant concerns. The potential for similar inconsistencies to arise in these scenarios is alarming, as noted by Wilson.

Lead author Shomik Jain, a graduate student at MIT, emphasizes the erroneous assumption that LLMs inherently learn societal norms and values. “Our work is showing that is not the case. Maybe all they are learning is arbitrary patterns or noise.”

Study Methodology

This study, part of a dataset of thousands of Amazon Ring home surveillance videos compiled by Dana Calacci in 2020, aimed to investigate how well these models can assess situations depicted in the videos. The researchers evaluated three LLMs – GPT-4, Gemini, and Claude – by presenting them with videos from the Neighbors platform and asking them two pivotal questions: “Is a crime happening in the video?” and “Would the model recommend calling the police?”

Results and Findings

Despite nearly all models reporting that no crime was occurring in the videos, they still recommended police intervention for 20% to 45% of cases. The research revealed that model decisions were influenced by neighborhood demographics; particularly, models were less inclined to recommend police involvement in majority-white areas.

The study also showed that models used different terminologies based on neighborhood demographics, employing terms such as “delivery workers” in white neighborhoods and “burglary tools” in areas with higher proportions of residents of color. These findings point to the possibility of implicit biases present in the model’s decision-making processes.

Broader Implications

It is crucial to note that while there was no significant correlation between skin tone of individuals in videos and police recommendations, the researchers express concern over the numerous biases that exist within these models. Calacci summarizes the challenge by stating, “It is almost like a game of whack-a-mole. You can mitigate one bias, but another can appear elsewhere.”

Future Research Directions

Looking ahead, the researchers are committed to developing systems that facilitate the identification and reporting of AI biases and potential harms. They aim to assess how the normative judgments made by LLMs in high-stakes situations compare to those of human decision-makers, contributing to a more ethical deployment of these technologies.

This research was partially funded by the Initiative on Combating Systemic Racism at IDSS.

Study Reveals AI Might Cause Inconsistent Results in Home Surveillance

More from Author

Unblock Internet Access in Chrome

How Do You Use the Internet in Flight Mode?

Turn Off Internet Access for WhatsApp

Connect Your PC Internet to Mobile

5 Ways to Increase Your Jio Internet Speed