Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain EMNLP 2025

Analyzing values about gendered language reform in LLMs’ revisions EMNLP 2025

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment EMNLP 2025

Large Language Models Discriminate Against Speakers of German Dialects EMNLP 2025

Watermarking Large Language Models: An Unbiased and Low-risk Method ACL 2025

The Psychology of Falsehood: A Human-Centric Survey of Misinformation Detection EMNLP 2025

Unequal Scientific Recognition in the Age of LLMs EMNLP 2025

IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages EMNLP 2025

Low-Resource Languages LLM Disinformation is Within Reach: The Case of Walliserdeutsch EMNLP 2025

Enhancing LLM Text Detection with Retrieved Contexts and Logits Distribution Consistency EMNLP 2025

A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users EMNLP 2025

Self-Augmented Preference Alignment for Sycophancy Reduction in LLMs EMNLP 2025

ReviewRL: Towards Automated Scientific Review with RL EMNLP 2025

Media Source Matters More Than Content: Unveiling Political Bias in LLM-Generated Citations EMNLP 2025

Adversarial Attacks Against Automated Fact-Checking: A Survey EMNLP 2025

Social Good or Scientific Curiosity? Uncovering the Research Framing Behind NLP Artefacts EMNLP 2025

Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization EMNLP 2025

Gamma-Guard: Lightweight Residual Adapters for Robust Guardrails in Large Language Models EMNLP 2025

Towards Truly Open, Language-Specific, Safe, Factual, and Specialized Large Language Models COLING 2025

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance EMNLP 2025

Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs EMNLP 2025

The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas EMNLP 2025

Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders EMNLP 2025

Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation EMNLP 2025

Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings ACL 2025