Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Synthetic Paths to Integral Truth: Mitigating Hallucinations Caused by Confirmation Bias with Synthetic Data COLING 2025

Position: Iterative Online-Offline Joint Optimization is Needed to Manage Complex LLM Copyright Risks ICML 2025

DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teaming NAACL 2025

BasqBBQ: A QA Benchmark for Assessing Social Biases in LLMs for Basque, a Low-Resource Language COLING 2025

What’s the most important value? INVP: INvestigating the Value Priorities of LLMs through Decision-making in Social Scenarios COLING 2025

Making Transparency Advocates: An Educational Approach Towards Better Algorithmic Transparency in Practice AAAI 2025

Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning COLING 2025

Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering COLING 2025

Mitigating Bias in Machine Learning: A Comprehensive Review and Novel Approaches AAAI 2025

An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Model is not a General Substitute for GPT-4 ACL 2025

LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets AAAI 2025

Beyond the Binary: Analysing Transphobic Hate and Harassment Online ACL 2025

Alleviating Hallucinations in Large Language Models via Truthfulness-driven Rank-adaptive LoRA ACL 2025

sudoLLM: On Multi-role Alignment of Language Models EMNLP 2025

Seeing Race, Feeling Bias: Emotion Stereotyping in Multimodal Language Models EMNLP 2025

Standard Quality Criteria Derived from Current NLP Evaluations for Guiding Evaluation Design and Grounding Comparability and AI Compliance Assessments ACL 2025

Saudi-Alignment Benchmark: Assessing LLMs Alignment with Cultural Norms and Domain Knowledge in the Saudi Context EMNLP 2025

Are LLMs Empathetic to All? Investigating the Influence of Multi-Demographic Personas on a Model’s Empathy EMNLP 2025

Data Attribution: A Data-Centric Approach for Trustworthy AI Development AAAI 2025

LLMs as Medical Safety Judges: Evaluating Alignment with Human Annotation in Patient-Facing QA ACL 2025

The AI Race: Why Current Neural Network-based Architectures are a Poor Basis for Artificial General Intelligence AAAI 2025

SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use NAACL 2025

Lessons for Editors of AI Incidents from the AI Incident Database AAAI 2025

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data NAACL 2025

iShumei-Chinchunmei at SemEval-2025 Task 4: A balanced forgetting and retention multi-task framework using effective unlearning loss SEMEVAL 2025