Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Intrinsic Barriers and Practical Pathways for Human–AI Alignment: An Agreement-Based Complexity Analysis
AAAI 2026
Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research
AAAI 2026
AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment
AAAI 2026
Beyond I’m Sorry, I Can’t: Dissecting Large-Language-Model Refusal
AAAI 2026
Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models
AAAI 2026
Confirmation Bias: A Challenge for Scalable Oversight
AAAI 2026
Detecting Compute Structuring in AI Governance Is Likely Feasible
AAAI 2026
Efficient Switchable Safety Control in LLMs via Magic-Token-Guided Co-Training
AAAI 2026
Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems
AAAI 2026
Safe Multi-agent Reinforcement Learning with Natural Language Constraints
AAAI 2026
Designing Incident Reporting Systems for Harms from General-Purpose AI
AAAI 2026
HumorReject: Decoupling LLM Safety from Refusal Prefix via a Little Humor
AAAI 2026
When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF
AAAI 2026
Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models
AAAI 2026
Composable Assurance for AI Alignment: A Framework for Propagating Formal Safety Properties Through MLOps
AAAI 2026
When Proxy Agents Disagree, Do Humans Mirror? Manipulating Human Behavior in Moral Dilemmas Through Agents
AAAI 2026
Beta Distribution Learning for Reliable Roadway Crash Risk Assessment
AAAI 2026
MHB: Medical Hallucination Benchmark for Large Language Models in Complex Clinical Tasks
AAAI 2026
Should You Use LLMs to Simulate Opinions? Quality Checks for Early-Stage Deliberation
AAAI 2026
Hashed Watermark as a Filter: A Unified Defense Against Forging and Overwriting Attacks in Neural Network Watermarking
AAAI 2026
Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
AAAI 2026
Consensus Learning with Multi-Party Perturbation Triggers for Secure Model Access
AAAI 2026
Probabilistic Safety Verification of Neural Policies via Predicate Abstraction
AAAI 2026
AURA: Affordance-Understanding and Risk-aware Alignment Technique for Large Language Models
AAAI 2026
MoralReason: Generalizable Moral Decision Alignment for LLM Agents Using Reasoning-Level Reinforcement Learning
AAAI 2026
<
1
2
3
4
5
…
119
>