Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

SafePersuasion: A Dataset, Taxonomy, and Baselines for Analysis of Rational Persuasion and Manipulation IJCNLP 2025

Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance IJCNLP 2025

UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases IJCNLP 2025

To Generate or Discriminate? Methodological Considerations for Measuring Cultural Alignment in LLMs IJCNLP 2025

GeoSAFE - A Novel Geospatial Artificial Intelligence Safety Assurance Framework and Evaluation for LLM Moderation IJCNLP 2025

Mātṛkā: Multilingual Jailbreak Evaluation of Open-Source Large Language Models IJCNLP 2025

Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios NAACL 2025

Intersectional Bias in Japanese Large Language Models from a Contextualized Perspective ACL 2025

Certified Mitigation of Worst-Case LLM Copyright Infringement EMNLP 2025

An Ethical Dataset from Real-World Interactions Between Users and Large Language Models IJCAI 2025

The Threat of PROMPTS in Large Language Models: A System and User Prompt Perspective ACL 2025

BanHateME : Understanding Hate in Bangla Memes thorough Detection, Categorization, and Target Profiling IJCNLP 2025

EMBRACE: Shaping Inclusive Opinion Representation by Aligning Implicit Conversations with Social Norms IJCNLP 2025

Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews EMNLP 2025

Instantly Learning Preference Alignment via In-context DPO NAACL 2025

Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities NAACL 2025

SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains EMNLP 2025

What is Behind Homelessness Bias? Using LLMs and NLP to Mitigate Homelessness by Acting on Social Stigma IJCAI 2025

DeTAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification ACL 2025

Wanted: Personalised Bias Warnings for Gender Bias in Language Models ACL 2025

Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards EMNLP 2025

Beemo: Benchmark of Expert-edited Machine-generated Outputs NAACL 2025

Decoupling Memories, Muting Neurons: Towards Practical Machine Unlearning for Large Language Models ACL 2025

Truth, Trust, and Trouble: Medical AI on the Edge EMNLP 2025

Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings ACL 2025