Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Random Noise Defense Against Query-Based Black-Box Attacks
NIPS 2021
Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods
NIPS 2021
EarFisher: Detecting Wireless Eavesdroppers by Stimulating and Sensing Memory EMR
NSDI 2021
Generating High-Quality Explanations for Navigation in Partially-Revealed Environments
NIPS 2021
Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork
AAAI 2021
Ethically Compliant Sequential Decision Making
AAAI 2021
Policy Teaching in Reinforcement Learning via Environment Poisoning Attacks
JMLR 2021
The Translucent Patch: A Physical and Universal Attack on Object Detectors
CVPR 2021
Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models
AAAI 2021
Towards a Unifying Framework for Formal Theories of Novelty
AAAI 2021
Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
NIPS 2021
Adversarial Attacks on Graph Classifiers via Bayesian Optimisation
NIPS 2021
Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks
NIPS 2021
Shift Invariance Can Reduce Adversarial Robustness
NIPS 2021
Rethinking Stealthiness of Backdoor Attack against NLP Models
ACL 2021
RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
EMNLP 2021
Adversarial Robustness with Non-uniform Perturbations
NIPS 2021
A PAC-Bayes Analysis of Adversarial Robustness
NIPS 2021
A Separation Result Between Data-oblivious and Data-aware Poisoning Attacks
NIPS 2021
Robustness of Graph Neural Networks at Scale
NIPS 2021
Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes
NIPS 2021
Foundations of Symbolic Languages for Model Interpretability
NIPS 2021
Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks
L4DC 2021
Safe Bayesian Optimisation for Controller Design by Utilising the Parameter Space Approach
L4DC 2021
Learning Approximate Forward Reachable Sets Using Separating Kernels
L4DC 2021
<
1
…
102
103
104
…
119
>