Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
Alignment for Efficient Tool Calling of Large Language Models
EMNLP 2025
SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMs
NAACL 2025
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?
NAACL 2025
Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Models
NAACL 2025
Atoxia: Red-teaming Large Language Models with Target Toxic Answers
NAACL 2025
The Lawyer That Never Thinks: Consistency and Fairness as Keys to Reliable AI
ACL 2025
Who Holds the Pen? Caricature and Perspective in LLM Retellings of History
EMNLP 2025
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs
ACL 2025
SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs
EMNLP 2025
Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities
NAACL 2025
Enhancing LLM Text Detection with Retrieved Contexts and Logits Distribution Consistency
EMNLP 2025
IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages
EMNLP 2025
Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities
EMNLP 2025
The Psychology of Falsehood: A Human-Centric Survey of Misinformation Detection
EMNLP 2025
Large Language Models Discriminate Against Speakers of German Dialects
EMNLP 2025
Watermarking Large Language Models: An Unbiased and Low-risk Method
ACL 2025
Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions
EMNLP 2025
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
EMNLP 2025
Identifying Unlearned Data in LLMs via Membership Inference Attacks
EMNLP 2025
Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore’s Low-Resource Languages
EMNLP 2025
Gamma-Guard: Lightweight Residual Adapters for Robust Guardrails in Large Language Models
EMNLP 2025
MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance
EMNLP 2025
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
EMNLP 2025
Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation
EMNLP 2025
Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
EMNLP 2025
<
1
…
13
14
15
…
80
>