Artificial Intelligence › Core AI ›

AI Safety

2972 directly classified papers

Papers per year

Papers

Breaking Barriers in Physical-World Adversarial Examples: Improving Robustness and Transferability via Robust Feature AAAI 2025

Adversarial-Inspired Backdoor Defense via Bridging Backdoor and Adversarial Attacks AAAI 2025

Everywhere Attack: Attacking Locally and Globally to Boost Targeted Transferability AAAI 2025

Backdoor Attack on Propagation-based Rumor Detectors AAAI 2025

Crossfire: An Elastic Defense Framework for Graph Neural Networks Under Bit Flip Attacks AAAI 2025

Graph Agent Network: Empowering Nodes with Inference Capabilities for Adversarial Resilience AAAI 2025

Grimm: A Plug-and-Play Perturbation Rectifier for Graph Neural Networks Defending Against Poisoning Attacks AAAI 2025

Protecting Model Adaptation from Trojans in the Unlabeled Data AAAI 2025

Quantitative Predictive Monitoring and Control for Safe Human-Machine Interaction AAAI 2025

Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment AAAI 2025

Debate Helps Weak-to-Strong Generalization AAAI 2025

Is Your Autonomous Vehicle Safe? Understanding the Threat of Electromagnetic Signal Injection Attacks on Traffic Scene Perception AAAI 2025

Single Character Perturbations Break LLM Alignment AAAI 2025

Towards Computational Foreseeability AAAI 2025

From Gambits to Assurances: Game-Theoretic Integration of Safety and Learning for Interactive Robotics AAAI 2025

Identifying Predictions That Influence the Future: Detecting Performative Concept Drift in Data Streams AAAI 2025

On the Robustness of Distributed Machine Learning Against Transfer Attacks AAAI 2025

Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation AAAI 2025

Operationalising Rawlsian Ethics for Fairness in Norm Learning Agents AAAI 2025

Increased Compute Efficiency and the Diffusion of AI Capabilities AAAI 2025

Measuring Error Alignment for Decision-Making Systems AAAI 2025

All You Need Is S P A C E: When Jailbreaking Meets Bias Audit and Reveals What Lies Beneath the Guardrails (Student Abstract) AAAI 2025

Guaranteeing Out-Of-Distribution Detection in Deep RL via Transition Estimation AAAI 2025

Safe Planner: Empowering Safety Awareness in Large Pre-Trained Models for Robot Task Planning AAAI 2025

ASP-Driven Emergency Planning for Norm Violations in Reinforcement Learning AAAI 2025