Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models ICCV 2025

Which Demographics do LLMs Default to During Annotation? ACL 2025

Modality-Fair Preference Optimization for Trustworthy MLLM Alignment IJCAI 2025

Defining and Quantifying Visual Hallucinations in Vision-Language Models NAACL 2025

Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models NAACL 2025

Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs ACL 2025

Rainbow-Teaming for the Polish Language: A Reproducibility Study NAACL 2025

HateImgPrompts: Mitigating Generation of Images Spreading Hate Speech NAACL 2025

Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models ACL 2025

Evaluating and Mitigating Linguistic Discrimination in Large Language Models: Perspectives on Safety Equity and Knowledge Equity IJCAI 2025

Human-Centered Disability Bias Detection in Large Language Models IJCNLP 2025

A Comparative Analysis of Ethical and Safety Gaps in LLMs using Relative Danger Coefficient NAACL 2025

Fostering Digital Inclusion for Low-Resource Nigerian Languages: A Case Study of Igbo and Nigerian Pidgin NAACL 2025

Stealing Training Data from Large Language Models in Decentralized Training through Activation Inversion Attack ACL 2025

shimig@DravidianLangTech2025: Stratification of Abusive content on Women in Social Media NAACL 2025

CUET_Absolute_Zero@DravidianLangTech 2025: Detecting AI-Generated Product Reviews in Malayalam and Tamil Language Using Transformer Models NAACL 2025

Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? ACL 2025

Wisdom from Diversity: Bias Mitigation Through Hybrid Human-LLM Crowds IJCAI 2025

NLP_goats_DravidianLangTech_2025__Detecting_AI_Written_Reviews_for_Consumer_Trust NAACL 2025

LLM Alignment for the Arabs: A Homogenous Culture or Diverse Ones NAACL 2025

GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models ACL 2025

GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation NAACL 2025

Evaluating Cultural and Social Awareness of LLM Web Agents NAACL 2025

PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance ACL 2025

Large Language Models Discriminate Against Speakers of German Dialects EMNLP 2025