Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities
EACL 2026
Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMs
EACL 2026
Systematic Analysis of the Unintentional CSAM-Generation-Potential of Text-to-Image Models
WACV 2026
HALP: Detecting Hallucinations in Vision-Language Models without Generating a Single Token
EACL 2026
Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval
EACL 2026
Now You Hear Me: Audio Narrative Attacks Against Large Audio–Language Models
EACL 2026
Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition
WACV 2026
Do LLM hallucination detectors suffer from low-resource effect?
EACL 2026
Learning Multilingual Agentic Policy to Control Sycophancy
EACL 2026
Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models
EACL 2026
ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language Models
EACL 2026
Beyond Names: How Grammatical Gender Markers Bias LLM-based Educational Recommendations
EACL 2026
SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
EACL 2026
Reasoning about Uncertainty: Do Reasoning Models Know When They Don’t Know?
EACL 2026
BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage
EACL 2026
FINEST: Improving LLM Responses to Sensitive Topics Through Fine-Grained Evaluation
EACL 2026
Phantom Menace: Exploring and Enhancing the Robustness of VLA Models Against Physical Sensor Attacks
AAAI 2026
Ethical Decision-making with AI: Value Alignment and the Role of Reasoning
AAAI 2026
When the Model Said ‘No Comment’, We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified
EACL 2026
Risk-Aware Bilingual Spoken Dialogue for Campus Mental Health Support
AAAI 2026
AuditAgent: LLM Agent for Risks Auditing in Recommender Systems
AAAI 2026
AgentSeer: Visualizing and Evaluating Temporal Actions in Agentic AI Systems
AAAI 2026
Principles2Plan: LLM-Guided System for Operationalising Ethical Principles into Plans
AAAI 2026
Attacker’s Noise Can Manipulate Your Audio-based LLM in the Real World
EACL 2026
From Delegates to Trustees: How Optimizing for Long-Term Interests Shapes Bias and Alignment in LLMs
EACL 2026
<
1
2
3
4
5
…
119
>