Artificial Intelligence › Core AI ›

AI Safety

2972 directly classified papers

Papers per year

Papers

How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities EACL 2026

Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMs EACL 2026

Systematic Analysis of the Unintentional CSAM-Generation-Potential of Text-to-Image Models WACV 2026

HALP: Detecting Hallucinations in Vision-Language Models without Generating a Single Token EACL 2026

Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval EACL 2026

Now You Hear Me: Audio Narrative Attacks Against Large Audio–Language Models EACL 2026

Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition WACV 2026

Do LLM hallucination detectors suffer from low-resource effect? EACL 2026

Learning Multilingual Agentic Policy to Control Sycophancy EACL 2026

Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models EACL 2026

ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language Models EACL 2026

Beyond Names: How Grammatical Gender Markers Bias LLM-based Educational Recommendations EACL 2026

SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning EACL 2026

Reasoning about Uncertainty: Do Reasoning Models Know When They Don’t Know? EACL 2026

BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage EACL 2026

FINEST: Improving LLM Responses to Sensitive Topics Through Fine-Grained Evaluation EACL 2026

Phantom Menace: Exploring and Enhancing the Robustness of VLA Models Against Physical Sensor Attacks AAAI 2026

Ethical Decision-making with AI: Value Alignment and the Role of Reasoning AAAI 2026

When the Model Said ‘No Comment’, We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified EACL 2026

Risk-Aware Bilingual Spoken Dialogue for Campus Mental Health Support AAAI 2026

AuditAgent: LLM Agent for Risks Auditing in Recommender Systems AAAI 2026

AgentSeer: Visualizing and Evaluating Temporal Actions in Agentic AI Systems AAAI 2026

Principles2Plan: LLM-Guided System for Operationalising Ethical Principles into Plans AAAI 2026

Attacker’s Noise Can Manipulate Your Audio-based LLM in the Real World EACL 2026

From Delegates to Trustees: How Optimizing for Long-Term Interests Shapes Bias and Alignment in LLMs EACL 2026