Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models
ACL 2024
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness
ACL 2024
ROSE Doesn’t Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding
ACL 2024
Evaluating Large Language Models for Health-related Queries with Presuppositions
ACL 2024
TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection
ACL 2024
Whose Emotions and Moral Sentiments do Language Models Reflect?
ACL 2024
Bias Bluff Busters at FIGNEWS 2024 Shared Task: Developing Guidelines to Make Bias Conscious
ACL 2024
Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models
ACL 2024
Ask LLMs Directly, “What shapes your bias?”: Measuring Social Bias in Large Language Models
ACL 2024
On Shortcuts and Biases: How Finetuned Language Models Distinguish Audience-Specific Instructions in Italian and English
ACL 2024
ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation
ACL 2024
GPT is Not an Annotator: The Necessity of Human Annotation in Fairness Benchmark Construction
ACL 2024
Disentangling Dialect from Social Bias via Multitask Learning to Improve Fairness
ACL 2024
Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective
EMNLP 2024
GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets
EMNLP 2024
Rethinking the Role of Proxy Rewards in Language Model Alignment
EMNLP 2024
Reward Modeling Requires Automatic Adjustment Based on Data Quality
EMNLP 2024
Impeding LLM-assisted Cheating in Introductory Programming Assignments via Adversarial Perturbation
EMNLP 2024
Evaluating Psychological Safety of Large Language Models
EMNLP 2024
On the Reliability of Psychological Scales on Large Language Models
EMNLP 2024
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
EMNLP 2024
ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models
EMNLP 2024
DetoxLLM: A Framework for Detoxification with Explanations
EMNLP 2024
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
CVPR 2024
MACE: Mass Concept Erasure in Diffusion Models
CVPR 2024
<
1
…
43
44
45
…
80
>