Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
PARSE: An Efficient Search Method for Black-box Adversarial Text Attacks
COLING 2022
Cross-document Misinformation Detection based on Event Graph Reasoning
NAACL 2022
A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction
NAACL 2022
On the Machine Learning of Ethical Judgments from Natural Language
NAACL 2022
Predicting Out-of-Distribution Error with the Projection Norm
ICML 2022
Input-Specific Robustness Certification for Randomized Smoothing
AAAI 2022
A Study of the Attention Abnormality in Trojaned BERTs
NAACL 2022
Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy
NAACL 2022
Aligning to Social Norms and Values in Interactive Narratives
NAACL 2022
Direct Behavior Specification via Constrained Reinforcement Learning
ICML 2022
Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation
ICML 2022
Reachability Constrained Reinforcement Learning
ICML 2022
User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning
INTERSPEECH 2022
Backdoor Attacks on the DNN Interpretation System
AAAI 2022
Robust Heterogeneous Graph Neural Networks against Adversarial Attacks
AAAI 2022
Combating Adversaries with Anti-adversaries
AAAI 2022
Provable Guarantees for Understanding Out-of-Distribution Detection
AAAI 2022
CC-CERT: A Probabilistic Approach to Certify General Robustness of Neural Networks
AAAI 2022
Verification of Neural-Network Control Systems by Integrating Taylor Models and Zonotopes
AAAI 2022
Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning
AAAI 2022
PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning
IJCAI 2022
Training OOD Detectors in their Natural Habitats
ICML 2022
Analyzing the Real Vulnerability of Hate Speech Detection Systems against Targeted Intentional Noise
COLING 2022
On the Impact of Spurious Correlation for Out-of-Distribution Detection
AAAI 2022
Enhancing Adversarial Robustness for Deep Metric Learning
CVPR 2022
<
1
…
95
96
97
…
119
>