Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

NLP for Counterspeech against Hate and Misinformation (CSHAM) ACL 2025

Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models ACL 2025

A Detailed Factor Analysis for the Political Compass Test: Navigating Ideologies of Large Language Models AACL 2025

Translate With Care: Addressing Gender Bias, Neutrality, and Reasoning in Large Language Model Translations ACL 2025

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing AACL 2025

Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models ACL 2025

Data Caricatures: On the Representation of African American Language in Pretraining Corpora ACL 2025

Bias Amplification: Large Language Models as Increasingly Biased Media IJCNLP 2025

Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion ACL 2025

Using LLM Judgements for Sanity Checking Results and Reproducibility of Human Evaluations in NLP ACL 2025

Oversight Structures for Agentic AI in Public-Sector Organizations ACL 2025

Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms IJCNLP 2025

From Evasion to Concealment: Stealthy Knowledge Unlearning for LLMs ACL 2025

DeTAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification ACL 2025

PMPO: A Self-Optimizing Framework for Creating High-Fidelity Measurement Tools for Social Bias in Large Language Models IJCNLP 2025

Navigating Ethical Challenges in NLP: Hands-on strategies for students and researchers ACL 2025

Taxonomizing Representational Harms using Speech Act Theory ACL 2025

Exploring the Impact of Instruction-Tuning on LLM’s Susceptibility to Misinformation ACL 2025

LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges ACL 2025

LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion ACL 2025

Guardrails and Security for LLMs: Safe, Secure and Controllable Steering of LLM Applications ACL 2025

Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation ACL 2025

FADE: Why Bad Descriptions Happen to Good Features ACL 2025

From Complexity to Clarity: AI/NLP’s Role in Regulatory Compliance ACL 2025

Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models ACL 2025