Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
NLP for Counterspeech against Hate and Misinformation (CSHAM)
ACL 2025
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models
ACL 2025
A Detailed Factor Analysis for the Political Compass Test: Navigating Ideologies of Large Language Models
AACL 2025
Translate With Care: Addressing Gender Bias, Neutrality, and Reasoning in Large Language Model Translations
ACL 2025
Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
AACL 2025
Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models
ACL 2025
Data Caricatures: On the Representation of African American Language in Pretraining Corpora
ACL 2025
Bias Amplification: Large Language Models as Increasingly Biased Media
IJCNLP 2025
Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion
ACL 2025
Using LLM Judgements for Sanity Checking Results and Reproducibility of Human Evaluations in NLP
ACL 2025
Oversight Structures for Agentic AI in Public-Sector Organizations
ACL 2025
Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms
IJCNLP 2025
From Evasion to Concealment: Stealthy Knowledge Unlearning for LLMs
ACL 2025
DeTAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification
ACL 2025
PMPO: A Self-Optimizing Framework for Creating High-Fidelity Measurement Tools for Social Bias in Large Language Models
IJCNLP 2025
Navigating Ethical Challenges in NLP: Hands-on strategies for students and researchers
ACL 2025
Taxonomizing Representational Harms using Speech Act Theory
ACL 2025
Exploring the Impact of Instruction-Tuning on LLM’s Susceptibility to Misinformation
ACL 2025
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
ACL 2025
LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
ACL 2025
Guardrails and Security for LLMs: Safe, Secure and Controllable Steering of LLM Applications
ACL 2025
Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
ACL 2025
FADE: Why Bad Descriptions Happen to Good Features
ACL 2025
From Complexity to Clarity: AI/NLP’s Role in Regulatory Compliance
ACL 2025
Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models
ACL 2025
<
1
…
24
25
26
…
80
>