Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
HogVul: Black-box Adversarial Code Generation Framework Against LM-based Vulnerability Detectors
AAAI 2026
Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs
AAAI 2026
Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security
AAAI 2026
PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems
AAAI 2026
Causal, Strategic, and Combined Responsibility Attribution in Situation Calculus Concurrent Game Structures
AAAI 2026
HAMLET4Fairness: Enhancing Fairness in AI Pipelines Through Human-Centered AutoML and Argumentation
AAAI 2026
Control Illusion: The Failure of Instruction Hierarchies in Large Language Models
AAAI 2026
Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System
AAAI 2026
Bootstrapping LLMs via Preference-Based Policy Optimization
AAAI 2026
EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers
AAAI 2026
ENCORE: Entropy-guided Reward Composition for Multi-head Safety Reward Models
AAAI 2026
LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation
AAAI 2026
Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment Through Latent Acoustic Pattern Triggers
AAAI 2026
SafeNLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces
AAAI 2026
Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models
AAAI 2026
WALKSAFE: Risk-aware Graph Random Walk with Bi-GRPO for LLM Safety
AAAI 2026
MirrorShield: Towards Dynamic Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting
AAAI 2026
MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies
AAAI 2026
Perturb Your Data: Paraphrase-Guided Training Data Watermarking
AAAI 2026
Fine-Tuned LLMs Know They Don’t Know: A Parameter-Efficient Approach to Recovering Honesty
AAAI 2026
Enhancing Pre-training Data Detection in LLMs Through Discriminative and Symmetric Prefix Selection
AAAI 2026
Efficient Hallucination Detection: Adaptive Bayesian Estimation of Semantic Entropy with Guided Semantic Exploration
AAAI 2026
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
AAAI 2026
Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation
AAAI 2026
Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment
AAAI 2026
<
1
…
8
9
10
…
119
>