Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
NLP for Counterspeech against Hate and Misinformation (CSHAM)
ACL 2025
From Misleading Queries to Accurate Answers: A Three-Stage Fine-Tuning Method for LLMs
ACL 2025
DeTAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification
ACL 2025
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
ACL 2025
Delving into Multilingual Ethical Bias: The MSQAD with Statistical Hypothesis Tests for Large Language Models
ACL 2025
Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral
ACL 2025
The Impossibility of Fair LLMs
ACL 2025
MerryQuery: A Trustworthy LLM-Powered Tool Providing Personalized Support for Educators and Students
AAAI 2025
Bias in Language Models: Beyond Trick Tests and Towards RUTEd Evaluation
ACL 2025
Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models
ACL 2025
The Unreasonable Effectiveness of Open Science in AI: A Replication Study
AAAI 2025
Moderating the Generalization of Score-based Generative Model
ICCV 2025
DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
AAAI 2025
Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding
NAACL 2025
Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control
CVPR 2025
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
ICCV 2025
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models
CVPR 2025
NeuroReset : LLM Unlearning via Dual Phase Mixed Methodology
SEMEVAL 2025
Smaller Large Language Models Can Do Moral Self-Correction
NAACL 2025
TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention
ICCV 2025
DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models
ICCV 2025
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
ICCV 2025
Scalable Dual Fingerprinting for Hierarchical Attribution of Text-to-Image Models
ICCV 2025
On the Mutual Influence of Gender and Occupation in LLM Representations
ACL 2025
Social Debiasing for Fair Multi-modal LLMs
ICCV 2025
<
1
…
23
24
25
…
80
>