Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

NLP for Counterspeech against Hate and Misinformation (CSHAM) ACL 2025

From Misleading Queries to Accurate Answers: A Three-Stage Fine-Tuning Method for LLMs ACL 2025

DeTAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification ACL 2025

Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation ACL 2025

Delving into Multilingual Ethical Bias: The MSQAD with Statistical Hypothesis Tests for Large Language Models ACL 2025

Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral ACL 2025

The Impossibility of Fair LLMs ACL 2025

MerryQuery: A Trustworthy LLM-Powered Tool Providing Personalized Support for Educators and Students AAAI 2025

Bias in Language Models: Beyond Trick Tests and Towards RUTEd Evaluation ACL 2025

Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models ACL 2025

The Unreasonable Effectiveness of Open Science in AI: A Replication Study AAAI 2025

Moderating the Generalization of Score-based Generative Model ICCV 2025

DuMo: Dual Encoder Modulation Network for Precise Concept Erasure AAAI 2025

Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding NAACL 2025

Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control CVPR 2025

BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation ICCV 2025

Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models CVPR 2025

NeuroReset : LLM Unlearning via Dual Phase Mixed Methodology SEMEVAL 2025

Smaller Large Language Models Can Do Moral Self-Correction NAACL 2025

TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention ICCV 2025

DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models ICCV 2025

Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions ICCV 2025

Scalable Dual Fingerprinting for Hierarchical Attribution of Text-to-Image Models ICCV 2025

On the Mutual Influence of Gender and Occupation in LLM Representations ACL 2025

Social Debiasing for Fair Multi-modal LLMs ICCV 2025