Artificial Intelligence › Core AI ›

AI Safety

2972 directly classified papers

Papers per year

Papers

Certified Defense for Content Based Image Retrieval WACV 2023

Adversarial Robustness in Discontinuous Spaces via Alternating Sampling & Descent WACV 2023

PolicyCleanse: Backdoor Detection and Mitigation for Competitive Reinforcement Learning ICCV 2023

RFLA: A Stealthy Reflected Light Adversarial Attack in the Physical World ICCV 2023

Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation ICCV 2023

Safe Reinforcement Learning via Probabilistic Logic Shields IJCAI 2023

Ethical By Designer - How to Grow Ethical Designers of Artificial Intelligence (Extended Abstract) IJCAI 2023

A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling CORL 2023

Topology-Matching Normalizing Flows for Out-of-Distribution Detection in Robot Learning CORL 2023

[MASK] Insertion: a robust method for anti-adversarial attacks EACL 2023

Human Control: Definitions and Algorithms UAI 2023

User-Centric Democratization towards Social Value Aligned Medical AI Services IJCAI 2023

State-wise Safe Reinforcement Learning: A Survey IJCAI 2023

Automated Reachability Analysis of Neural Network-Controlled Systems via Adaptive Polytopes L4DC 2023

Designing System Level Synthesis Controllers for Nonlinear Systems with Stability Guarantees L4DC 2023

Adversarial Robustness through Random Weight Sampling NIPS 2023

Asymmetric Certified Robustness via Feature-Convex Neural Networks NIPS 2023

Towards Stable Backdoor Purification through Feature Shift Tuning NIPS 2023

On the Adversarial Robustness of Out-of-distribution Generalization Models NIPS 2023

Django: Detecting Trojans in Object Detection Models via Gaussian Focus Calibration NIPS 2023

Can we trust the evaluation on ChatGPT? ACL 2023

Maestro: A Gamified Platform for Teaching AI Robustness AAAI 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans AISTATS 2023

Uniformly Conservative Exploration in Reinforcement Learning AISTATS 2023

Does Label Differential Privacy Prevent Label Inference Attacks? AISTATS 2023