Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Adversarial Laser Spot: Robust and Covert Physical-World Attack to DNNs
ACML 2022
BackdoorBench: A Comprehensive Benchmark of Backdoor Learning
NIPS 2022
Evolution of Neural Tangent Kernels under Benign and Adversarial Training
NIPS 2022
Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal
NIPS 2022
Error Amplification When Updating Deployed Machine Learning Models
MLHC 2022
Why predicting risk can’t identify ‘risk factors’: empirical assessment of model stability in machine learning across observational health databases
MLHC 2022
Increasing Confidence in Adversarial Robustness Evaluations
NIPS 2022
Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart
CVPR 2022
Transferable 3D Adversarial Textures Using End-to-End Optimization
WACV 2022
Universal Evasion Attacks on Summarization Scoring
EMNLP 2022
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
NIPS 2022
LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness
NIPS 2022
Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective
NIPS 2022
A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks
NIPS 2022
Your Out-of-Distribution Detection Method is Not Robust!
NIPS 2022
Can Adversarial Training Be Manipulated By Non-Robust Features?
NIPS 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
NIPS 2022
Capturing Failures of Large Language Models via Human Cognitive Biases
NIPS 2022
Weight Perturbation as Defense against Adversarial Word Substitutions
EMNLP 2022
Safety Guarantees for Neural Network Dynamic Systems via Stochastic Barrier Functions
NIPS 2022
Detecting textual adversarial examples through randomized substitution and vote
UAI 2022
Data dependent randomized smoothing
UAI 2022
Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks
NIPS 2021
Anti-Backdoor Learning: Training Clean Models on Poisoned Data
NIPS 2021
Adversarial Neuron Pruning Purifies Backdoored Deep Models
NIPS 2021
<
1
…
100
101
102
…
119
>