Artificial Intelligence › Core AI ›

AI Safety

2972 directly classified papers

Papers per year

Papers

Neural Abstractions NIPS 2022

One-shot Neural Backdoor Erasing via Adversarial Weight Masking NIPS 2022

Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch NIPS 2022

Pre-activation Distributions Expose Backdoor Neurons NIPS 2022

When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture NIPS 2022

Lexicographic Multi-Objective Reinforcement Learning IJCAI 2022

Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs EMNLP 2022

SafeText: A Benchmark for Exploring Physical Safety in Language Models EMNLP 2022

Verification and Monitoring for First-Order LTL with Persistence-Preserving Quantification over Finite and Infinite Traces IJCAI 2022

A Universal Identity Backdoor Attack against Speaker Verification based on Siamese Network INTERSPEECH 2022

Towards Practical Certifiable Patch Defense With Vision Transformer CVPR 2022

BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning CVPR 2022

Certified Patch Robustness via Smoothed Vision Transformers CVPR 2022

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free CVPR 2022

Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations CVPR 2022

Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack CVPR 2022

Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning IJCAI 2022

On the Adversarial Robustness of Causal Algorithmic Recourse ICML 2022

Not All Poisons are Created Equal: Robust Training against Data Poisoning ICML 2022

Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack ICML 2022

Certified Robustness Against Natural Language Attacks by Causal Intervention ICML 2022

An Equivalence Between Data Poisoning and Byzantine Gradient Attacks ICML 2022

Path-Specific Objectives for Safer Agent Incentives AAAI 2022

ROSE: Robust Selective Fine-tuning for Pre-trained Language Models EMNLP 2022

GoTube: Scalable Statistical Verification of Continuous-Depth Models AAAI 2022