Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
Data to Defense: The Role of Curation in Aligning Large Language Models Against Safety Compromise
EMNLP 2025
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models
CVPR 2025
AI Chatbots as Professional Service Agents: Developing a Professional Identity
EMNLP 2025
Evaluating Cultural and Social Awareness of LLM Web Agents
NAACL 2025
Advancing Oversight Reasoning across Languages for Audit Sycophantic Behaviour via X-Agent
EMNLP 2025
Maximizing Signal in Human-Model Preference Alignment
AAAI 2025
The discordance between embedded ethics and cultural inference in large language models
EMNLP 2025
SenDetEX: Sentence-Level AI-Generated Text Detection for Human-AI Hybrid Content via Style and Context Fusion
EMNLP 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
EMNLP 2025
Linguistic and Embedding-Based Profiling of Texts Generated by Humans and Large Language Models
EMNLP 2025
MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs
EMNLP 2025
UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation
EMNLP 2025
Joint Vision-Language Social Bias Removal for CLIP
CVPR 2025
A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations
EMNLP 2025
The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It
EMNLP 2025
NLP_goats_DravidianLangTech_2025__Detecting_AI_Written_Reviews_for_Consumer_Trust
NAACL 2025
Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization
CVPR 2025
Implicit Values Embedded in How Humans and LLMs Complete Subjective Everyday Tasks
EMNLP 2025
Incorporating Diverse Perspectives in Cultural Alignment: Survey of Evaluation Benchmarks Through A Three-Dimensional Framework
EMNLP 2025
CUET_Absolute_Zero@DravidianLangTech 2025: Detecting AI-Generated Product Reviews in Malayalam and Tamil Language Using Transformer Models
NAACL 2025
Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training?
CVPR 2025
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
CVPR 2025
Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models
CVPR 2025
Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control
CVPR 2025
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
ACL 2025
<
1
…
28
29
30
…
80
>