Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
Social Debiasing for Fair Multi-modal LLMs
ICCV 2025
Comparing Moral Values in Western English-speaking societies and LLMs with Word Associations
ACL 2025
TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention
ICCV 2025
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
ICCV 2025
Smaller Large Language Models Can Do Moral Self-Correction
NAACL 2025
Did Translation Models Get More Robust Without Anyone Even Noticing?
ACL 2025
Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding
NAACL 2025
From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models
NAACL 2025
Quantifying Cognitive Bias Induction in LLM-Generated Content
IJCNLP 2025
A Practical Analysis of Human Alignment with *PO
NAACL 2025
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
ICCV 2025
Scalable Dual Fingerprinting for Hierarchical Attribution of Text-to-Image Models
ICCV 2025
DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teaming
NAACL 2025
CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications
IJCNLP 2025
TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models
NAACL 2025
Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales
ACL 2025
PMPO: A Self-Optimizing Framework for Creating High-Fidelity Measurement Tools for Social Bias in Large Language Models
IJCNLP 2025
B4: A Black-Box Scrubbing Attack on LLM Watermarks
NAACL 2025
“All that Glitters”: Techniques for Evaluations with Unreliable Model and Human Annotations
NAACL 2025
Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs
NAACL 2025
Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms
IJCNLP 2025
Beyond Excess and Deficiency: Adaptive Length Bias Mitigation in Reward Models for RLHF
NAACL 2025
Identifying and Mitigating Social Bias Knowledge in Language Models
NAACL 2025
Bias Amplification: Large Language Models as Increasingly Biased Media
IJCNLP 2025
Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models
NAACL 2025
<
1
…
22
23
24
…
80
>