Artificial Intelligence › Core AI ›

AI Safety

2972 directly classified papers

Papers per year

Papers

PARSE: An Efficient Search Method for Black-box Adversarial Text Attacks COLING 2022

Cross-document Misinformation Detection based on Event Graph Reasoning NAACL 2022

A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction NAACL 2022

On the Machine Learning of Ethical Judgments from Natural Language NAACL 2022

Predicting Out-of-Distribution Error with the Projection Norm ICML 2022

Input-Specific Robustness Certification for Randomized Smoothing AAAI 2022

A Study of the Attention Abnormality in Trojaned BERTs NAACL 2022

Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy NAACL 2022

Aligning to Social Norms and Values in Interactive Narratives NAACL 2022

Direct Behavior Specification via Constrained Reinforcement Learning ICML 2022

Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation ICML 2022

Reachability Constrained Reinforcement Learning ICML 2022

User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning INTERSPEECH 2022

Backdoor Attacks on the DNN Interpretation System AAAI 2022

Robust Heterogeneous Graph Neural Networks against Adversarial Attacks AAAI 2022

Combating Adversaries with Anti-adversaries AAAI 2022

Provable Guarantees for Understanding Out-of-Distribution Detection AAAI 2022

CC-CERT: A Probabilistic Approach to Certify General Robustness of Neural Networks AAAI 2022

Verification of Neural-Network Control Systems by Integrating Taylor Models and Zonotopes AAAI 2022

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning AAAI 2022

PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning IJCAI 2022

Training OOD Detectors in their Natural Habitats ICML 2022

Analyzing the Real Vulnerability of Hate Speech Detection Systems against Targeted Intentional Noise COLING 2022

On the Impact of Spurious Correlation for Out-of-Distribution Detection AAAI 2022

Enhancing Adversarial Robustness for Deep Metric Learning CVPR 2022