Investigating Intersectional Bias in Large Language Models via Coreference Confidence Disparities
Sources: https://machinelearning.apple.com/research/investigating-intersectional, machinelearning.apple.com
TL;DR
- Large language models (LLMs) are widely used as decision-support tools in settings like hiring and admissions, but can reflect and amplify societal biases.
- The WinoIdentity benchmark expands WinoBias with 25 demographic markers across 10 attributes, intersected with binary gender, creating 245,700 prompts to assess 50 bias patterns.
- The study introduces Coreference Confidence Disparity, a group fairness metric that measures differences in model confidence across intersectional identities.
- Evaluations on five recently published LLMs show confidence disparities up to 40%, with higher uncertainty for doubly-disadvantaged identities in anti-stereotypical contexts, suggesting memorization over reasoning.
- These findings highlight two independent failures: value alignment and validity, which can compound social harm in real-world deployments.
Context and background
Large language models have demonstrated strong performance and have been adopted as decision-support tools in high-stakes, resource-constrained contexts, such as hiring and admissions. However, there is broad scientific consensus that AI systems can reflect, and in some cases exacerbate, societal biases, raising concerns about identity-based harms when used in critical social contexts. Prior work established benchmarks that evaluate fairness along single axes of discrimination by examining demographic disparities across language reasoning tasks. In this work, the authors extend single-axis fairness evaluations to address intersectional bias, recognizing that overlapping axes of discrimination can create distinct patterns of disadvantage. The researchers constructed a new benchmark, WinoIdentity, by augmenting the existing WinoBias dataset with 25 demographic markers spanning 10 attributes. These attributes are intersected with binary gender to yield a rich suite of prompts—245,700 in total—to evaluate 50 distinct bias patterns. The focus is on harms of omission due to underrepresentation, assessed through the lens of model uncertainty rather than solely on accuracy metrics. The project also introduces a new metric, Coreference Confidence Disparity, designed to capture whether models exhibit higher or lower confidence for certain intersectional identities. The study assesses five recently published LLMs and reports notable confidence disparities across multiple demographic attributes, including body type, sexual orientation, and socio-economic status.
What’s new
- A new intersectional bias benchmark called WinoIdentity that expands WinoBias with 25 demographic markers across 10 attributes intersected with binary gender, for a total of 245,700 prompts and 50 bias patterns.
- Introduction of Coreference Confidence Disparity, a group (un)fairness metric that quantifies differences in model confidence across intersectional identities.
- Empirical evaluation of five recently published LLMs, revealing confidence disparities as high as 40% across various demographic attributes.
- Key finding: models are most uncertain about doubly-disadvantaged identities in anti-stereotypical settings, and confidence can decrease even for hegemonic or privileged markers, suggesting memorization rather than robust logical reasoning.
- The results point to two independent failures in value alignment and validity that can compound social harm in real-world applications.
Why it matters (impact for developers/enterprises)
For developers and enterprises deploying LLMs in real-world decision-support workflows, understanding model uncertainty is as important as baseline accuracy. If a system is more confident about one group than another, even with similar overall performance, downstream decisions may disproportionately affect certain identities. The study’s emphasis on uncertainty-aware fairness provides a lens to identify when outputs should be routed to human experts, flagged for review, or accompanied by safe defaults. As organizations increasingly rely on LLMs in sensitive domains, benchmarks like WinoIdentity and metrics like Coreference Confidence Disparity help quantify risks and guide more responsible model development and deployment practices.
Technical details or Implementation
The core methodological contributions center on data construction and measurement:
- WinoIdentity combines 25 demographic markers across 10 attributes with binary gender, producing 245,700 prompts that map to 50 distinct bias patterns. This design enables analysis of how multiple, intersecting identity factors influence model behavior and confidence.
- Coreference Confidence Disparity measures how confident the model is about its coreference-related inferences across different intersectional identity groups. This metric targets the question of whether the model’s uncertainty is distributed unevenly across identities, which can indicate hidden biases in reasoning or representation.
- The evaluation covers five recently published LLMs to examine whether improvements in overall capabilities align with more uniform confidence across diverse identities.
- Findings indicate confidence disparities up to 40% across attributes including body type, sexual orientation, and socio-economic status, underscoring that high performance in general tasks does not guarantee fair or reliable behavior across all groups.
- An important observation is that the highest uncertainty often arises for identities that are doubly disadvantaged in anti-stereotypical scenarios, suggesting complex interactions between identity factors and model reasoning.
- The authors report that confidence declines even for markers traditionally considered privileged, hinting at memorization effects rather than robust, logic-based generalization as a dominant factor in current LLM behavior. The study frames these results as two independent failures: value alignment (whether models reflect desired social values) and validity (whether outputs reliably correspond to correct reasoning across identities). Both failures can occur simultaneously and compound potential harms in downstream applications.
Key takeaways
- Intersectional bias in LLMs requires evaluating both accuracy and model confidence across identity groups, not just group-level averages.
- The WinoIdentity benchmark provides a scalable way to test a wide range of demographic intersections with many bias patterns.
- Coreference Confidence Disparity reveals substantial disparities in model confidence across identities, highlighting reliability gaps that could impact decision-making.
- Results suggest that current LLM behavior may depend more on memorization than solid logical reasoning for certain identity contexts.
- Enterprises should consider uncertainty-aware risk assessments and appropriate routing or safeguards when deploying LLMs in high-stakes domains.
FAQ
-
What is WinoIdentity and why was it created?
WinoIdentity is a benchmark that augments WinoBias with 25 demographic markers across 10 attributes, intersected with binary gender, yielding 245,700 prompts to evaluate 50 bias patterns. It aims to study intersectional bias and underrepresentation harms.
-
What does Coreference Confidence Disparity measure?
It measures whether models are more or less confident for certain intersectional identities, capturing group-level differences in uncertainty beyond raw accuracy.
-
How many models were tested and what were the key findings?
Five recently published LLMs were evaluated. The study observed confidence disparities up to 40% across attributes, with higher uncertainty for doubly-disadvantaged identities in anti-stereotypical contexts, and declines in confidence even for privileged markers.
-
What are the practical implications for deployment?
The results emphasize the need to account for model uncertainty in high-stakes applications, potentially routing uncertain cases to human reviewers or implementing safe defaults to mitigate social harms.
-
Where can I read more about this work?
The full study and benchmark are described by Apple’s machine learning research at https://machinelearning.apple.com/research/investigating-intersectional.
References
More news
First look at the Google Home app powered by Gemini
The Verge reports Google is updating the Google Home app to bring Gemini features, including an Ask Home search bar, a redesigned UI, and Gemini-driven controls for the home.
Shadow Leak shows how ChatGPT agents can exfiltrate Gmail data via prompt injection
Security researchers demonstrated a prompt-injection attack called Shadow Leak that leveraged ChatGPT’s Deep Research to covertly extract data from a Gmail inbox. OpenAI patched the flaw; the case highlights risks of agentic AI.
Predict Extreme Weather in Minutes Without a Supercomputer: Huge Ensembles (HENS)
NVIDIA and Berkeley Lab unveil Huge Ensembles (HENS), an open-source AI tool that forecasts low-likelihood, high-impact weather events using 27,000 years of data, with ready-to-run options.
Scaleway Joins Hugging Face Inference Providers for Serverless, Low-Latency Inference
Scaleway is now a supported Inference Provider on the Hugging Face Hub, enabling serverless inference directly on model pages with JS and Python SDKs. Access popular open-weight models and enjoy scalable, low-latency AI workflows.
Google expands Gemini in Chrome with cross-platform rollout and no membership fee
Gemini AI in Chrome gains access to tabs, history, and Google properties, rolling out to Mac and Windows in the US without a fee, and enabling task automation and Workspace integrations.
How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo
NVIDIA Dynamo offloads KV Cache from GPU memory to cost-efficient storage, enabling longer context windows, higher concurrency, and lower inference costs for large-scale LLMs and generative AI workloads.