Artificial Intelligence › Core AI ›

AI Safety

2972 directly classified papers

Papers per year

Papers

Using Punctuation as an Adversarial Attack on Deep Learning-Based NLP Systems: An Empirical Study EACL 2023

Gradient-Informed Neural Network Statistical Robustness Estimation AISTATS 2023

Provable Safe Reinforcement Learning with Binary Feedback AISTATS 2023

Out-of-Distribution Detection With Reconstruction Error and Typicality-Based Penalty WACV 2023

Proactive Deepfake Defence via Identity Watermarking WACV 2023

DE-CROP: Data-Efficient Certified Robustness for Pretrained Classifiers WACV 2023

Do Adaptive Active Attacks Pose Greater Risk Than Static Attacks? WACV 2023

NNSplitter: An Active Defense Solution for DNN Model via Automated Weight Obfuscation ICML 2023

Certifying Ensembles: A General Certification Theory with S-Lipschitzness ICML 2023

Understanding Backdoor Attacks through the Adaptability Hypothesis ICML 2023

Adversarial Parameter Attack on Deep Neural Networks ICML 2023

Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability ICML 2023

Analyzing Intentional Behavior in Autonomous Agents under Uncertainty IJCAI 2023

Explanation-Guided Reward Alignment IJCAI 2023

Adversarial Behavior Exclusion for Safe Reinforcement Learning IJCAI 2023

Probabilistic Temporal Logic for Reasoning about Bounded Policies IJCAI 2023

Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents? IJCAI 2023

A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning IJCAI 2023

Efficient Global Robustness Certification of Neural Networks via Interleaving Twin-Network Encoding (Extended Abstract) IJCAI 2023

Sample Efficient Paradigms for Personalized Assessment of Taskable AI Systems IJCAI 2023

TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models ICCV 2023

Among Us: Adversarially Robust Collaborative Perception by Consensus ICCV 2023

Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack ICCV 2023

Towards Robust Model Watermark via Reducing Parametric Vulnerability ICCV 2023

The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data ICCV 2023