Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Neural Abstractions
NIPS 2022
One-shot Neural Backdoor Erasing via Adversarial Weight Masking
NIPS 2022
Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch
NIPS 2022
Pre-activation Distributions Expose Backdoor Neurons
NIPS 2022
When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture
NIPS 2022
Lexicographic Multi-Objective Reinforcement Learning
IJCAI 2022
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
EMNLP 2022
SafeText: A Benchmark for Exploring Physical Safety in Language Models
EMNLP 2022
Verification and Monitoring for First-Order LTL with Persistence-Preserving Quantification over Finite and Infinite Traces
IJCAI 2022
A Universal Identity Backdoor Attack against Speaker Verification based on Siamese Network
INTERSPEECH 2022
Towards Practical Certifiable Patch Defense With Vision Transformer
CVPR 2022
BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning
CVPR 2022
Certified Patch Robustness via Smoothed Vision Transformers
CVPR 2022
Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free
CVPR 2022
Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations
CVPR 2022
Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack
CVPR 2022
Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning
IJCAI 2022
On the Adversarial Robustness of Causal Algorithmic Recourse
ICML 2022
Not All Poisons are Created Equal: Robust Training against Data Poisoning
ICML 2022
Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack
ICML 2022
Certified Robustness Against Natural Language Attacks by Causal Intervention
ICML 2022
An Equivalence Between Data Poisoning and Byzantine Gradient Attacks
ICML 2022
Path-Specific Objectives for Safer Agent Incentives
AAAI 2022
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models
EMNLP 2022
GoTube: Scalable Statistical Verification of Continuous-Depth Models
AAAI 2022
<
1
…
93
94
95
…
119
>