Tech

AI notification summaries may have racial and gender biases

Published

on

When specifically tailored queries made to test Apple Intelligence using developer tools are intentionally ambiguous about race and gender, researchers have seen biases pop up.

AI Forensics, a German nonprofit, analyzed over 10,000 notification summaries created by Apple’s AI feature. The report suggests that Apple Intelligence treats White people as the “default” while applying gender stereotypes when no gender has been specified.

According to the report, Apple Intelligence has a tendency to ignore a person’s ethnicity if they are caucasian. Conversely, any messages that mentioned another ethnicity regularly saw the notification summary follow suit.

The report found that when working with identical messages, Apple’s AI model only mentioned a person’s ethnicity as being white 53% of the time. But those figures were considerably higher for other ethnicities; their ethnicity was mentioned 89% of the time when they were Asian, 86% when they were Hispanic, and 64% when they were Black.

Advertisement

The research claims that Apple Intelligence assumes that the person mentioned in the messages is white the majority of the time. Effectively, the model believes that white is the norm.

Another example shows Apple Intelligence assigning gender roles when none were given.

The tests used a sentence that mentioned both a doctor and a nurse, stopping short of getting into specifics. However, Apple Intelligence created associations that weren’t in the original message in 77% of the summaries tested.

Further, 67% of those instances saw Apple Intelligence assume that the doctor was a man. It then went on to make a similar assumption that the nurse was a woman.

Advertisement

Notably, it’s believed that the AI’s training data led to the assumptions. They closely align with U.S. workforce demographics, suggesting that the AI is simply working with the information it was trained on.

Similar biases were observed across a variety of different criteria. The report shows that eight social dimensions, including age, disability, nationality, religion, and sexual orientation, were all subject to the AI’s assumptions.

Methods and limitations

In a report detailing its work, AI Forensics explains that it used a custom application made using Apple’s developer tools to run its tests. That application hooked into Apple’s Foundation Models framework to simulate real-world messages.

That approach means that the testing closely matches what users of other third-party messaging apps might experience. However, there is still some considerable room for inaccuracy.

Advertisement

AI Forensics admits that its “test scenarios are synthetic constructions designed to probe specific bias dimensions, not naturalistic notifications.”. It adds that real messages may differ in the way that they are written and, as a result, interpreted by Apple Intelligence.

The outfit also notes that real-world messages may not use the same “ambiguous pronoun references” as its test messages. This, we think, is the biggest flaw in the research.

However, it’s important to note that any biases, like the ones shown in this report, can be huge at Apple’s scale. Apple Intelligence is used on hundreds of millions of devices every day.

Similar results to those highlighted in this report may well occur in considerable numbers.

Advertisement

More bad press for Apple’s summaries

This isn’t the first time that Apple’s AI-powered notification summaries have come under fire. In December 2024, the BBC complained that summaries of its news articles were wrong.

One example notification read “Luigi Mangione shoots himself,” referring to the man arrested for the murder of UnitedHealthcare CEO Brian Thompson. Mangione was, and is, alive and currently awaiting trial.

Apple subsequently disabled notification summaries for news apps while it worked on fixing the issue. But this report shows that notifications for communication apps, like Messages, continue to prove problematic.

Apple is clearly aware of Apple Intelligence’s shortcomings. The company recently signed a deal with Google to bring its Gemini AI model to Siri.

Advertisement

But following reports that the revamped Siri will not ship with iOS 26.4 as expected, hopes of an imminent improvement have been dashed.

Interestingly, AI Forensics also notes that Google’s Gemma3-1B model is much smaller than Apple’s, yet more accurate. In testing, it hallucinated

less frequently as well as less stereotypically.

Apple recently placed software chief Craig Federighi in charge of its AI efforts, a sign that it isn’t happy with Apple Intelligence as-is. But improvements are slow to come.

Advertisement

Hope of a quick fix for the kinds of biases highlighted by AI Forensics is likely to be dashed much more quickly.

Source link

Advertisement

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending

Exit mobile version