conftrace_

Artificial Intelligence › Core AI ›

Safety

414 papers

Papers per year

1

1

4

8

11

21

29

36

87

117

99

Papers

DeformRS: Certifying Input Deformations with Randomized Smoothing AAAI 2022

Safe Online Convex Optimization with Unknown Linear Safety Constraints AAAI 2022

Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes AAAI 2022

Tight Neural Network Verification via Semidefinite Relaxations and Linear Reformulations AAAI 2022

Stability Verification in Stochastic Control Systems via Neural Network Supermartingales AAAI 2022

Exploring Safer Behaviors for Deep Reinforcement Learning AAAI 2022

Preemptive Image Robustification for Protecting Users against Man-in-the-Middle Adversarial Attacks AAAI 2022

Planning to Avoid Side Effects AAAI 2022

Leashing the Inner Demons: Self-Detoxification for Language Models AAAI 2022

‘Beach’ to ‘Bitch’: Inadvertent Unsafe Transcription of Kids’ Content on YouTube AAAI 2022

SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures ACL 2022

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark ACL 2022

Best Arm Identification with Safety Constraints AISTATS 2022

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart CVPR 2022

SafeText: A Benchmark for Exploring Physical Safety in Language Models EMNLP 2022

Red Teaming Language Models with Language Models EMNLP 2022

ProsocialDialog: A Prosocial Backbone for Conversational Agents EMNLP 2022

Handling and Presenting Harmful Text in NLP Research EMNLP 2022

Safe Reinforcement Learning by Imagining the Near Future NIPS 2021

Anti-Backdoor Learning: Training Clean Models on Poisoned Data NIPS 2021

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs NIPS 2021

Topological Detection of Trojaned Neural Networks NIPS 2021

Safe Policy Optimization with Local Generalized Linear Function Approximations NIPS 2021

Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds NIPS 2021

Counterexample Guided RL Policy Refinement Using Bayesian Optimization NIPS 2021