Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Alignment for Efficient Tool Calling of Large Language Models EMNLP 2025

SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMs NAACL 2025

How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making? NAACL 2025

Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Models NAACL 2025

Atoxia: Red-teaming Large Language Models with Target Toxic Answers NAACL 2025

The Lawyer That Never Thinks: Consistency and Fairness as Keys to Reliable AI ACL 2025

Who Holds the Pen? Caricature and Perspective in LLM Retellings of History EMNLP 2025

White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs ACL 2025

SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs EMNLP 2025

Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities NAACL 2025

Enhancing LLM Text Detection with Retrieved Contexts and Logits Distribution Consistency EMNLP 2025

IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages EMNLP 2025

Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities EMNLP 2025

The Psychology of Falsehood: A Human-Centric Survey of Misinformation Detection EMNLP 2025

Large Language Models Discriminate Against Speakers of German Dialects EMNLP 2025

Watermarking Large Language Models: An Unbiased and Low-risk Method ACL 2025

Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions EMNLP 2025

Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training EMNLP 2025

Identifying Unlearned Data in LLMs via Membership Inference Attacks EMNLP 2025

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore’s Low-Resource Languages EMNLP 2025

Gamma-Guard: Lightweight Residual Adapters for Robust Guardrails in Large Language Models EMNLP 2025

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance EMNLP 2025

Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders EMNLP 2025

Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation EMNLP 2025

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment EMNLP 2025