Artificial Intelligence › Core AI ›

Fairness

1139 directly classified papers

Papers per year

Papers

Multilingual Large Language Models Leak Human Stereotypes across Language Boundaries ACL 2025

AI Tools Can Generate Misculture Visuals! Detecting Prompts Generating Misculture Visuals For Prevention ACL 2025

Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models ACL 2025

CLAIM: An Intent-Driven Multi-Agent Framework for Analyzing Manipulation in Courtroom Dialogues ACL 2025

Comparing Methods for Multi-Label Classification of Manipulation Techniques in Ukrainian Telegram Content ACL 2025

GBEM-UA: Gender Bias Evaluation and Mitigation for Ukrainian Large Language Models ACL 2025

The UNLP 2025 Shared Task on Detecting Social Media Manipulation ACL 2025

A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection ACL 2025

Towards Fairness Assessment of Dutch Hate Speech Detection ACL 2025

Red-Teaming for Uncovering Societal Bias in Large Language Models ACL 2025

BiasEdit: Debiasing Stereotyped Language Models via Model Editing NAACL 2025

Towards Inclusive Arabic LLMs: A Culturally Aligned Benchmark in Arabic Large Language Model Evaluation COLING 2025

Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads NAACL 2025

Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization EMNLP 2025

Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding NAACL 2025

Evaluating Dialect Robustness of Language Models via Conversation Understanding COLING 2025

Gender Encoding Patterns in Pretrained Language Model Representations NAACL 2025

GENDER1PERSON: Test Suite for Estimating Gender Bias of First-person Singular Forms EMNLP 2025

Gender Bias in Large Language Models across Multiple Languages: A Case Study of ChatGPT NAACL 2025

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection COLING 2025

Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions NAACL 2025

HateImgPrompts: Mitigating Generation of Images Spreading Hate Speech NAACL 2025

Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets EMNLP 2025

Multi-Group Proportional Representations for Text-to-Image Models CVPR 2025

Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey NAACL 2025