Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts ACL 2025

The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation ACL 2025

Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design ACL 2025

Multilingual NLP for African Healthcare: Bias, Translation, and Explainability Challenges ACL 2025

Improved Unbiased Watermark for Large Language Models ACL 2025

Moral Compass: A Data-Driven Benchmark for Ethical Cognition in AI IJCAI 2025

Rethinking Prompt-based Debiasing in Large Language Model ACL 2025

7 Points to Tsinghua but 10 Points to ? Assessing Large Language Models in Agentic Multilingual National Bias ACL 2025

If Eleanor Rigby Had Met ChatGPT: A Study on Loneliness in a Post-LLM World ACL 2025

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents ACL 2025

SuMa: A Subspace Mapping Approach for Robust and Effective Concept Erasure in Text-to-Image Diffusion Models ICCV 2025

ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models ACL 2025

Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale Datasets for Responsible LLMs IJCAI 2025

AdvERSEM: Adversarial Robustness Testing and Training of LLM-based Groundedness Evaluators via Semantic Structure Manipulation EMNLP 2025

Scaling Trends for Data Poisoning in LLMs AAAI 2025

Data Attribution: A Data-Centric Approach for Trustworthy AI Development AAAI 2025

Fair Domain Generalization with Heterogeneous Sensitive Attributes Across Domains WACV 2025

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning ICCV 2025

Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy ICCV 2025

Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs ACL 2025

TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models ICCV 2025

Which Demographics do LLMs Default to During Annotation? ACL 2025

Modality-Fair Preference Optimization for Trustworthy MLLM Alignment IJCAI 2025

Investigating and Mitigating Undesirable Biases in Large Language Models AAAI 2025

Defining and Quantifying Visual Hallucinations in Vision-Language Models NAACL 2025