Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Using Punctuation as an Adversarial Attack on Deep Learning-Based NLP Systems: An Empirical Study
EACL 2023
Gradient-Informed Neural Network Statistical Robustness Estimation
AISTATS 2023
Provable Safe Reinforcement Learning with Binary Feedback
AISTATS 2023
Out-of-Distribution Detection With Reconstruction Error and Typicality-Based Penalty
WACV 2023
Proactive Deepfake Defence via Identity Watermarking
WACV 2023
DE-CROP: Data-Efficient Certified Robustness for Pretrained Classifiers
WACV 2023
Do Adaptive Active Attacks Pose Greater Risk Than Static Attacks?
WACV 2023
NNSplitter: An Active Defense Solution for DNN Model via Automated Weight Obfuscation
ICML 2023
Certifying Ensembles: A General Certification Theory with S-Lipschitzness
ICML 2023
Understanding Backdoor Attacks through the Adaptability Hypothesis
ICML 2023
Adversarial Parameter Attack on Deep Neural Networks
ICML 2023
Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability
ICML 2023
Analyzing Intentional Behavior in Autonomous Agents under Uncertainty
IJCAI 2023
Explanation-Guided Reward Alignment
IJCAI 2023
Adversarial Behavior Exclusion for Safe Reinforcement Learning
IJCAI 2023
Probabilistic Temporal Logic for Reasoning about Bounded Policies
IJCAI 2023
Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?
IJCAI 2023
A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning
IJCAI 2023
Efficient Global Robustness Certification of Neural Networks via Interleaving Twin-Network Encoding (Extended Abstract)
IJCAI 2023
Sample Efficient Paradigms for Personalized Assessment of Taskable AI Systems
IJCAI 2023
TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models
ICCV 2023
Among Us: Adversarially Robust Collaborative Perception by Consensus
ICCV 2023
Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack
ICCV 2023
Towards Robust Model Watermark via Reducing Parametric Vulnerability
ICCV 2023
The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data
ICCV 2023
<
1
…
88
89
90
…
119
>