Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

From Distributional to Overton Pluralism: Investigating Large Language Model Alignment NAACL 2025

On Weaponization-Resistant Large Language Models with Prospect Theoretic Alignment COLING 2025

KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions COLING 2025

PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines NAACL 2025

SafeLawBench: Towards Safe Alignment of Large Language Models ACL 2025

Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings ACL 2025

Loki: An Open-Source Tool for Fact Verification COLING 2025

Unfamiliar Finetuning Examples Control How Language Models Hallucinate NAACL 2025

Self-Pluralising Culture Alignment for Large Language Models NAACL 2025

Iterative Multilingual Spectral Attribute Erasure EMNLP 2025

Moral Compass: A Data-Driven Benchmark for Ethical Cognition in AI IJCAI 2025

Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking EMNLP 2025

Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design ACL 2025

A Comprehensive Framework to Operationalize Social Stereotypes for Responsible AI Evaluations EMNLP 2025

TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention ICCV 2025

Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study EMNLP 2025

LLMs Do Not See Age: Assessing Demographic Bias in Automated Systematic Review Synthesis IJCNLP 2025

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization IJCNLP 2025

Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics IJCNLP 2025

Small Changes, Large Consequences: Analyzing the Allocational Fairness of LLMs in Hiring Contexts IJCNLP 2025

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 IJCNLP 2025

Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers IJCNLP 2025

MPF: Aligning and Debiasing Language Models post Deployment via Multi-Perspective Fusion IJCNLP 2025

Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization NAACL 2025

Extracting and Understanding the Superficial Knowledge in Alignment NAACL 2025