Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Certified Defense for Content Based Image Retrieval
WACV 2023
Adversarial Robustness in Discontinuous Spaces via Alternating Sampling & Descent
WACV 2023
PolicyCleanse: Backdoor Detection and Mitigation for Competitive Reinforcement Learning
ICCV 2023
RFLA: A Stealthy Reflected Light Adversarial Attack in the Physical World
ICCV 2023
Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation
ICCV 2023
Safe Reinforcement Learning via Probabilistic Logic Shields
IJCAI 2023
Ethical By Designer - How to Grow Ethical Designers of Artificial Intelligence (Extended Abstract)
IJCAI 2023
A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling
CORL 2023
Topology-Matching Normalizing Flows for Out-of-Distribution Detection in Robot Learning
CORL 2023
[MASK] Insertion: a robust method for anti-adversarial attacks
EACL 2023
Human Control: Definitions and Algorithms
UAI 2023
User-Centric Democratization towards Social Value Aligned Medical AI Services
IJCAI 2023
State-wise Safe Reinforcement Learning: A Survey
IJCAI 2023
Automated Reachability Analysis of Neural Network-Controlled Systems via Adaptive Polytopes
L4DC 2023
Designing System Level Synthesis Controllers for Nonlinear Systems with Stability Guarantees
L4DC 2023
Adversarial Robustness through Random Weight Sampling
NIPS 2023
Asymmetric Certified Robustness via Feature-Convex Neural Networks
NIPS 2023
Towards Stable Backdoor Purification through Feature Shift Tuning
NIPS 2023
On the Adversarial Robustness of Out-of-distribution Generalization Models
NIPS 2023
Django: Detecting Trojans in Object Detection Models via Gaussian Focus Calibration
NIPS 2023
Can we trust the evaluation on ChatGPT?
ACL 2023
Maestro: A Gamified Platform for Teaching AI Robustness
AAAI 2023
Who Should Predict? Exact Algorithms For Learning to Defer to Humans
AISTATS 2023
Uniformly Conservative Exploration in Reinforcement Learning
AISTATS 2023
Does Label Differential Privacy Prevent Label Inference Attacks?
AISTATS 2023
<
1
…
83
84
85
…
119
>