conftrace_

Artificial Intelligence › Core AI ›

Safety

414 papers

Papers per year

1

1

4

8

11

21

29

36

87

117

99

Papers

PKAD: Pretrained Knowledge is All You Need to Detect and Mitigate Textual Backdoor Attacks EMNLP 2024

TrustAgent: Towards Safe and Trustworthy LLM-based Agents EMNLP 2024

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression EMNLP 2024

Towards Test-Time Refusals via Concept Negation NIPS 2023

Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms NIPS 2023

Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities NIPS 2023

Multi-Agent First Order Constrained Optimization in Policy Space NIPS 2023

Characterizing the Optimal $0-1$ Loss for Multi-class Classification with a Test-time Attacker NIPS 2023

Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints AAAI 2023

CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration AAAI 2023

Safe Reinforcement Learning via Shielding under Partial Observability AAAI 2023

Correct-by-Construction Reinforcement Learning of Cardiac Pacemakers from Duration Calculus Requirements AAAI 2023

SafeLight: A Reinforcement Learning Method toward Collision-Free Traffic Signal Control AAAI 2023

AutoCost: Evolving Intrinsic Cost for Zero-Violation Reinforcement Learning AAAI 2023

Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning AAAI 2023

Safety Verification of Nonlinear Systems with Bayesian Neural Network Controllers AAAI 2023

Evaluating Model-Free Reinforcement Learning toward Safety-Critical Tasks AAAI 2023

Rethinking Safe Control in the Presence of Self-Seeking Humans AAAI 2023

Safety Validation of Learning-Based Autonomous Systems: A Multi-Fidelity Approach AAAI 2023

Targeted Knowledge Infusion To Make Conversational AI Explainable and Safe AAAI 2023

Advances in AI for Safety, Equity, and Well-Being on Web and Social Media: Detection, Robustness, Attribution, and Mitigation AAAI 2023

Combining Runtime Monitoring and Machine Learning with Human Feedback AAAI 2023

Towards Safe and Resilient Autonomy in Multi-Robot Systems AAAI 2023

MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning ACL 2023

Text Adversarial Purification as Defense against Adversarial Attacks ACL 2023