conftrace_

Artificial Intelligence › Core AI ›

AI Safety

3,026 papers

Papers per year

1

1

1

4

1

5

1

13

40

91

111

181

204

333

642

1031

366

'15

'20

'25

Papers

TWINFUZZ: Dual-Model Fuzzing for Robustness Generalization in Deep Learning AAAI 2026

Resilience in Ambient Multi-Agent LLMs via Decentralized Bio-Autonomic Control and Immune-Inspired Anomaly Detection AAAI 2026

The Alignment Game: A Theory of Long-Horizon Alignment Through Recursive Curation AAAI 2026

SMiLE: Provably Enforcing Global Relational Properties in Neural Networks AAAI 2026

AlignTree: Efficient Defense Against LLM Jailbreak Attacks AAAI 2026

Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation AAAI 2026

Silenced Biases: The Dark Side LLMs Learned to Refuse AAAI 2026

Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks AAAI 2026

Requirements for Aligned, Dynamic Resolution of Conflicts in Operational Constraints AAAI 2026

Moral Change or Noise? On Problems of Aligning AI with Temporally Unstable Human Feedback AAAI 2026

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment AAAI 2026

Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment AAAI 2026

Selective Weak-to-Strong Generalization AAAI 2026

MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control AAAI 2026

StyleBreak: Revealing Alignment Vulnerabilities in Large Audio-Language Models via Style-Aware Audio Jailbreak AAAI 2026

Mitigating Self-Preference by Authorship Obfuscation AAAI 2026

Misalignment from Treating Means as Ends AAAI 2026

STACK: Adversarial Attacks on LLM Safeguard Pipelines AAAI 2026

Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping AAAI 2026

Intrinsic Barriers and Practical Pathways for Human–AI Alignment: An Agreement-Based Complexity Analysis AAAI 2026

Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research AAAI 2026

LieCraft: A Multi-Agent Framework for Evaluating Deceptive Capabilities in Language Models AAAI 2026

AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment AAAI 2026

Beyond I’m Sorry, I Can’t: Dissecting Large-Language-Model Refusal AAAI 2026

Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models AAAI 2026