Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Bias Mitigation or Cultural Commonsense? Evaluating LLMs with a Japanese Dataset EMNLP 2025

MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models EMNLP 2025

Fostering Digital Inclusion for Low-Resource Nigerian Languages: A Case Study of Igbo and Nigerian Pidgin NAACL 2025

Alignment for Efficient Tool Calling of Large Language Models EMNLP 2025

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing AACL 2025

Less Is More? Examining Fairness in Pruned Large Language Models for Summarising Opinions EMNLP 2025

A Comparative Analysis of Ethical and Safety Gaps in LLMs using Relative Danger Coefficient NAACL 2025

From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs EMNLP 2025

MPTA: MultiTask Personalization Assessment EMNLP 2025

Iterative Prompt Refinement for Safer Text-to-Image Generation EMNLP 2025

HateImgPrompts: Mitigating Generation of Images Spreading Hate Speech NAACL 2025

STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models EMNLP 2025

Evaluating and Mitigating Linguistic Discrimination in Large Language Models: Perspectives on Safety Equity and Knowledge Equity IJCAI 2025

MolErr2Fix: Benchmarking LLM Trustworthiness in Chemistry via Modular Error Detection, Localization, Explanation, and Correction EMNLP 2025

Rainbow-Teaming for the Polish Language: A Reproducibility Study NAACL 2025

Bias Amplification: Large Language Models as Increasingly Biased Media AACL 2025

Role-Aware Language Models for Secure and Contextualized Access Control in Organizations AACL 2025

On the Convergence of Moral Self-Correction in Large Language Models AACL 2025

PMPO: A Self-Optimizing Framework for Creating High-Fidelity Measurement Tools for Social Bias in Large Language Models AACL 2025

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization AACL 2025

Small Changes, Large Consequences: Analyzing the Allocational Fairness of LLMs in Hiring Contexts AACL 2025

ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting AACL 2025

Quantifying Cognitive Bias Induction in LLM-Generated Content AACL 2025

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 AACL 2025

Are LLMs Rational Investors? A Study on the Financial Bias in LLMs ACL 2025