Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Counterfactual harm
NIPS 2022
Towards Safe Reinforcement Learning with a Safety Editor Policy
NIPS 2022
WeDef: Weakly Supervised Backdoor Defense for Text Classification
EMNLP 2022
ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning
CVPR 2022
Investigating Top-k White-Box and Transferable Black-Box Attack
CVPR 2022
NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models
CVPR 2022
Better Trigger Inversion Optimization in Backdoor Scanning
CVPR 2022
Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power
NIPS 2022
Handcrafted Backdoors in Deep Neural Networks
NIPS 2022
Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks
NIPS 2022
Flooding-X: Improving BERT’s Resistance to Adversarial Attacks via Loss-Restricted Fine-Tuning
ACL 2022
Model AI Assignments 2022
AAAI 2022
Improving Robustness Against Stealthy Weight Bit-Flip Attacks by Output Code Matching
CVPR 2022
Preparing High School Teachers to Integrate AI Methods into STEM Classrooms
AAAI 2022
Authentic Integration of Ethics and AI through Sociotechnical, Problem-Based Learning
AAAI 2022
Differential Assessment of Black-Box AI Agents
AAAI 2022
Cosine Model Watermarking against Ensemble Distillation
AAAI 2022
Exploring the Universal Vulnerability of Prompt-based Learning Paradigm
NAACL 2022
Certified Robustness via Locally Biased Randomized Smoothing
L4DC 2022
FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings
NIPS 2022
Rethinking the Reverse-engineering of Trojan Triggers
NIPS 2022
Toward Robust Spiking Neural Network Against Adversarial Perturbation
NIPS 2022
Safe Control with Neural Network Dynamic Models
L4DC 2022
EvenNet: Ignoring Odd-Hop Neighbors Improves Robustness of Graph Neural Networks
NIPS 2022
Automatic Termination for Hyperparameter Optimization
AUTOML 2022
<
1
…
99
100
101
…
119
>