Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
Style Over Substance: Evaluation Biases for Large Language Models
COLING 2025
The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
COLING 2025
Exploring Backdoor Vulnerabilities of Chat Models
COLING 2025
Do language models practice what they preach? Examining language ideologies about gendered language reform encoded in LLMs
COLING 2025
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
COLING 2025
“Not Aligned” is Not “Malicious”: Being Careful about Hallucinations of Large Language Models’ Jailbreak
COLING 2025
The Gaps between Fine Tuning and In-context Learning in Bias Evaluation and Debiasing
COLING 2025
LLM Sensitivity Challenges in Abusive Language Detection: Instruction-Tuned vs. Human Feedback
COLING 2025
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration
COLING 2025
Automated Progressive Red Teaming
COLING 2025
MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models
ACL 2025
From Complexity to Clarity: AI/NLP’s Role in Regulatory Compliance
ACL 2025
Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language
NAACL 2025
How to Make LLMs Forget: On Reversing In-Context Knowledge Edits
NAACL 2025
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering
EMNLP 2025
Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases
EMNLP 2025
Meta-Cultural Competence: Climbing the Right Hill of Cultural Awareness
NAACL 2025
DAMON: A Dialogue-Aware MCTS Framework for Jailbreaking Large Language Models
EMNLP 2025
R-TOFU: Unlearning in Large Reasoning Models
EMNLP 2025
Biased LLMs can Influence Political Decision-Making
ACL 2025
ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs
NAACL 2025
Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where?
EMNLP 2025
SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models
EMNLP 2025
Anak Baik: A Low-Cost Approach to Curate Indonesian Ethical and Unethical Instructions
COLING 2025
Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs
EMNLP 2025
<
1
…
20
21
22
…
80
>