conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
414 papers
Papers per year
2016: 1
1
2017: 1
1
2018: 4
4
2019: 8
8
2020: 11
11
2021: 21
21
2022: 29
29
2023: 36
36
2024: 87
87
2025: 117
117
2026: 99
99
Papers
PKAD: Pretrained Knowledge is All You Need to Detect and Mitigate Textual Backdoor Attacks
EMNLP 2024
TrustAgent: Towards Safe and Trustworthy LLM-based Agents
EMNLP 2024
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
EMNLP 2024
Towards Test-Time Refusals via Concept Negation
NIPS 2023
Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms
NIPS 2023
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities
NIPS 2023
Multi-Agent First Order Constrained Optimization in Policy Space
NIPS 2023
Characterizing the Optimal $0-1$ Loss for Multi-class Classification with a Test-time Attacker
NIPS 2023
Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints
AAAI 2023
CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration
AAAI 2023
Safe Reinforcement Learning via Shielding under Partial Observability
AAAI 2023
Correct-by-Construction Reinforcement Learning of Cardiac Pacemakers from Duration Calculus Requirements
AAAI 2023
SafeLight: A Reinforcement Learning Method toward Collision-Free Traffic Signal Control
AAAI 2023
AutoCost: Evolving Intrinsic Cost for Zero-Violation Reinforcement Learning
AAAI 2023
Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning
AAAI 2023
Safety Verification of Nonlinear Systems with Bayesian Neural Network Controllers
AAAI 2023
Evaluating Model-Free Reinforcement Learning toward Safety-Critical Tasks
AAAI 2023
Rethinking Safe Control in the Presence of Self-Seeking Humans
AAAI 2023
Safety Validation of Learning-Based Autonomous Systems: A Multi-Fidelity Approach
AAAI 2023
Targeted Knowledge Infusion To Make Conversational AI Explainable and Safe
AAAI 2023
Advances in AI for Safety, Equity, and Well-Being on Web and Social Media: Detection, Robustness, Attribution, and Mitigation
AAAI 2023
Combining Runtime Monitoring and Machine Learning with Human Feedback
AAAI 2023
Towards Safe and Resilient Autonomy in Multi-Robot Systems
AAAI 2023
MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning
ACL 2023
Text Adversarial Purification as Defense against Adversarial Attacks
ACL 2023
<
1
…
12
13
14
…
17
>