Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

AI Chatbots as Professional Service Agents: Developing a Professional Identity EMNLP 2025

Advancing Oversight Reasoning across Languages for Audit Sycophantic Behaviour via X-Agent EMNLP 2025

The discordance between embedded ethics and cultural inference in large language models EMNLP 2025

Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification EMNLP 2025

Reward Model Perspectives: Whose Opinions Do Reward Models Reward? EMNLP 2025

A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations EMNLP 2025

The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It EMNLP 2025

Implicit Values Embedded in How Humans and LLMs Complete Subjective Everyday Tasks EMNLP 2025

Incorporating Diverse Perspectives in Cultural Alignment: Survey of Evaluation Benchmarks Through A Three-Dimensional Framework EMNLP 2025

Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety EMNLP 2025

Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models EMNLP 2025

Bias Mitigation or Cultural Commonsense? Evaluating LLMs with a Japanese Dataset EMNLP 2025

MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models EMNLP 2025

Alignment for Efficient Tool Calling of Large Language Models EMNLP 2025

Less Is More? Examining Fairness in Pruned Large Language Models for Summarising Opinions EMNLP 2025

From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs EMNLP 2025

Iterative Prompt Refinement for Safer Text-to-Image Generation EMNLP 2025

STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models EMNLP 2025

MolErr2Fix: Benchmarking LLM Trustworthiness in Chemistry via Modular Error Detection, Localization, Explanation, and Correction EMNLP 2025

Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities EMNLP 2025

Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions EMNLP 2025

Path Drift in Large Reasoning Models: How First-Person Commitments Override Safety EMNLP 2025

Who Holds the Pen? Caricature and Perspective in LLM Retellings of History EMNLP 2025

SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs EMNLP 2025

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment EMNLP 2025